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ABSTRACT 

The distribution of galaxies in position and velocity around the centers of galaxy clusters encodes 
important information about cluster mass and structure. Using the maxBCG galaxy cluster catalog 
identified from imaging data obtained in the Sloan Digital Sky Survey, we study the BCG-galaxy 
velocity correlation function. By modeling its non-Gaussianity, we measure the mean and scatter in 
velocity dispersion at fixed richness. The mean velocity dispersion increases from 202 ± 10 km s~^ 
for small groups to more than 854 ± 102 km s~^ for large clusters. We show the scatter to be at 
most 40.5 ± 3.5%, declining to 14.9 ± 9.4% in the richest bins. We test our methods in the C4 cluster 
catalog, a spectroscopic cluster catalog produced from the Sloan Digital Sky Survey DR2 spectroscopic 
sample, and in mock galaxy catalogs constructed from N-body simulations. Our methods are robust, 
measuring the scatter to well within one-sigma of the true value, and the mean to within 10%, in the 
mock catalogs. By convolving the scatter in velocity dispersion at fixed richness with the observed 
richness space density function, we measure the velocity dispersion function of the maxBCG galaxy 
clusters. Although velocity dispersion and richness do not form a true mass-observable relation, 
the relationship between velocity dispersion and mass is theoretically well characterized and has low 
scatter. Thus our results provide a key link between theory and observations up to the velocity bias 
between dark matter and galaxies. 

Subject headings: galaxies: clusters: general — cosmology — methods: data analysis 



1. INTRODUCTION 

Galaxy clusters play an important role in observa- 
tions of the large-scale structure of the Universe. As 
dramatically non-linear features in the matter distri- 
bution, they stand out as individually identifiable ob- 
jects, whose abundant galaxies and hot X-ray emit- 
ting gas provide a rich variety of observable prop- 
erties. Clusters can be ide ntified b y their galaxy 
content f Bahcall e t al.l 
Miller et al. 2005 
Kocster et al 



[2001 Gla dders fc Yed 



Gerk e et al]|2005t iBerlind et all 
2007a, b) , their thermal X-ray 



2005 



2006 



^ emission 

(iRosati et al.l ll998: Bo hringer et al.ll2000l : iPopesso et all 
120041) ■ the Sunvaev-Zeldovich decrei nent they produce 
in the microwave back ground signal (jGrego et al.l 120001 : 
[Lancaster et al.l l2005| ). or the weak lensing signature 
the y produce in the shape s of distant background galax- 
ies ([Wittman et al.l [20061 ). Each of these identification 
methods also produces proxies for mass: e.g, the num- 
ber of galaxies, total stellar luminosity, galaxy velocity 
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dispersion. X-ray luminosity and temperature, or SZ and 
weak lensing profiles. 

Simulations of the formation and evolution of large- 
scale structure through gravitational collapse provide 
us with rich predictions for the ex pected matter dis- 
tribution within a g iven cosmology ([Evrard et al.l [200^ 
[Springel et al.|[2b05f ). These predictions include not only 
first-order features, like the halo mass function n(M,z), 
but higher-order correlations as well, like the precise way 
in which galaxy clusters are themselves clustered as a 
function of mass. Comparisons of these theoretical pre- 
dictions to the observed Universe provide an excellent op- 
portunity to test our understanding of cosmology and the 
formation of large-scale structure. The weak point in this 
chain is that simulations most reliably predict the dark 
matter distribution, while observations are most directly 
sensitive to luminous galaxies and gas. Connections be- 
tween observable properties and theoretical predictions 
for dark matter have often been made through simplify- 
ing assumptions that are hard to justify a priori. 

Progress toward solving this problem has been made by 
the construction of various mass-observable scaling rela- 
tions, which are based on combinations of th eoretical pre- 
dictions and observational measurern ents ([Levine et al.l 
[200l lDahld[200l iStanek et al.ll2006D . However, knowl- 
edge of the mean mass at a fixed value of the observ- 
able is not sufficient to extract precise cosmological con- 
straints gi ven the exponential s hape of the halo mass 
function ( Lima fc Hu[|200l [20051) . To perform precision 
cosmology, we must understand the scatter in the mass- 
observable relations as w ell. 

Recently, IStanek et all (|2OO60 measured the scatter in 
the temperature-luminosity relation for X-ray selected 
galaxy clusters and used it to infer the scatter in the 
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mass-luminosity relationship. Unfortunately, there are 
relatively few measurements of the scatter in any mass- 
observable relationship in the optical. (An exception is 
an early observation of scatter in the velocity-dispersion- 
richness relationshi p for a small sample of massive clus- 
ters bv iMazure et all [19961 ) For optically selected clus- 
ters, the scat ter is usually included as a parameter in the 
analysis (e.g. iGladders et al.l [20071 : iRozo et al.ll2007allbl ). 

The primary goal of this work is to develop a method 
to estimate both the mean and scatter in the cluster 
velocity-dispersion-richness relation. This comparison 
between two observable quantities can be made without 
reference to structure formation theory. The method de- 
veloped is applied to the SDSS maxBCG cluster catalog- 
a photometrically selected catalog with extensive spec- 
troscopic follow-up. These meth ods are tested ext en- 
sively with both the C4 catalog (jMiller et al.l 120051 ). a 
smaller spectroscopically-selected sample of clusters, and 
new mock catalogs generated by combining N-body sim- 
ulations with a prescription for galaxy population (Wech- 
sler et al. 2007, in preparation). 

Ultimately, we aim to connect richness to mass through 
measurements of velocity dispersion. While the link 
between dark matter velocity dispersion and mass is 
kno wn from N- body simulations to have very small scat- 
ter (lEvrard et a l. 2007), the relationship between galaxy 
and dark matter velocity dispersion (the velocity bias) re- 
mains uncertain and will require additional study. Con- 
straints on the normalization and scatter of the total 
mass-richness relation obtained by this method are thus 
limited by uncertainty in the velocity bias. 

An outline of the SDSS data and simulations used in 
this work is presented in ^J2] In fJU we provide a brief 
overview of the maxBCG cluster finding algorithm and 
the properties of the detected cluster sample. We will 
focus in this paper on measurements of the BCG-galaxy 
velocity correlation function (BGVCF), which is intro- 
duced along with the various fitting methods we employ 
in 21 Section [5] presents a new method for understand- 
ing the scatter in the optical richness- velocity dispersion 
relation and the computation of the velocity dispersion 
function for the maxBCG clusters. Section [6] presents 
measurements of the BGVCF as a function of various 
cluster properties. We connect our velocity dispersion 
measurements to mass in SJT] Finally, we conclude and 
discuss future directions in S|8l 

2. SDSS DATA AND MOCK CATALOGS 

2.1. SDSS Data 

Data for t his s tudy are drawn f rom the SDSS 

(lYork et all |2000|; [Ab a pJiai LeLSI l2004 l2005l : 
lAdelman-McCarthv et al.l i2006f ) . a combined imag- 
ing and spectroscopic survey of ~ 10^ deg^ in the North 
Galactic Cap, and a smaller, deeper region in the South. 
The imaging survey is carried out in drift-scan mode 
in the five SDSS filt ers (u, q, r, i z) to a limiting 
magnitude of r<22.5 (iFukugita et all 119961 : iGunn et all 
119981 ISmith et all |200^. Galaxy clusters are selected 
from ~ 7500 sq. degrees of available SDSS imaging 
data, and from the mock catalo gs described below, using 
the maxBCG method (Koestc r et al.l l2007b) which is 
outlined in fJS] 

The spectroscopic survey targets a "main" sample of 
galaxies with r<17.8 and a median redshift of z^^O.l 



( Strauss et al.ll2002D an d a "luminous red galaxy" sample 
( Eisenstein et al.ll2001h which is roughly volume limited 
out to z=0.38, but further extends to z=0.6. The "main" 
sample composes about 90% of the catalog, with the "lu- 
minous red galaxy" sample making up the rest. Velocity 
errors in the redshift survey are ~30 km s~^. We use 
the SDSS DR5 spectroscopic catalog which includes over 
640,000 galaxies. The mask for our spectroscopic catalog 
was taken from the New York Univ ersity Value-Added 
Galaxy Catalog (|Blanton et al.ll2005f) . 

2.2. Mock Galaxy Catalogs 

In order to understand the robustness of our meth- 
ods for measuring the mean and scatter of the relation 
between cluster velocity dispersion and richness, we per- 
form several tests on realistic mock galaxy catalogs. Be- 
cause the maxBCG method relies on measurements of 
galaxy positions, luminosities, and colors and their clus- 
tering, these catalogs must reproduce these aspects of 
the SDSS data in some detail. 

In this work, we use mock catalogs created by the 
ADDGALS (Adding Density-Determined Galaxies to 
Lightcone Simulations) method (described by Wechsler 
et al. 2007, in preparation) which is specifically de- 
signed for this purpose. These catalogs populate a 
dark matter light-cone simulation with galaxies using an 
observationally-motivated biasing scheme. Galaxies are 
inserted in these simulations at the locations of individ- 
ual dark matter particles, subject to several empirical 
constraints. The relation between dark matter particles 
of a given over-density (on a mass scale of ~ lO^'^Mo) 
is connected to the two point correlation function of 
these particles. This connection is used to assign subsets 
of particles to galaxies using a probability distribution 
P{5\Lr), chosen to reproduce the luminosity-dependent 
cor relation funct i on of galaxies as measured in the SDSS 
by IZehavi et al.l (|2005f ). The number of galaxies of a 
given brightness placed within the simulations is deter- 
mined from th e measured SDSS r-band galaxy luminos- 
ity function ( Blanton et al.lf2"003f ). We consider galaxies 
brighter than 0.4 L*, because it is these galaxies that are 
counted in the maxBCG richness estimate. Finally, col- 
ors are assigned to each galaxy by measuring their local 
galaxy density in redshift space, and assigning to them 
the colors of a real SDSS galaxy wit h similar lum inosity 
and local density (see also Tasitsiomi et al.l 120041 ) . The 
local density measure used is the fifth nearest neighbor 
galaxy in a magnitude and redshift slice, and for SDSS 
galaxies is taken from a volume-limited sample of the 
CMU-Pitt DR4 Value Added Catalog". 

This method produces mock galaxy catalogs whose 
galaxies reproduce several properties of the observed 
SDSS galaxies. In particular, they follow the empir- 
ical galaxy color-density relation and its evolution, a 
property of fundamental importance for ridgeline-based 
cluster detection methods. The process accounts for k- 
corrections between rest and observed frame colors and 
assigns realistic photometric errors. Each mock galaxy is 
associated with a dark matter particle and adopts its 3D 
motion. This is important, as it encodes in the motions 
of the mock galaxies the full dynamical richness of the 
N-body simulation. Galaxies may occupy fully virialized 

Available at www.sdss.org/dr4/products/valuc_added. 
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regions, be descending into clusters for the first time, or 
be slowly streaming along nearby filaments. This com- 
plete sampling of the velocity field around fully realized 
N-body halos is essential, as these mock catalogs allow 
us to predict directly the velocity structure we ought to 
observe in the data. Note that this simulation process, 
by design, assumes no velocity bias between the dark 
matter and the luminous galaxies, except for the BCG, 
which is made by artificially placing the brightest galaxy 
assigned to a given halo at its dynamic center. 

In this work, we use two mock catalogs based on dif- 
ferent simulations. The first is base d on the Hubble Vo l- 
ume simulation (the MS lightcone of lEvrard et al.l[2002l ). 
which has a particle mass of 2.25 x IO-'^^Mq while the 
second is based on a simulation run at Los Alamos Na- 
tional Laboratory (LANL) using the Hashed-Oct-Tree 
code (|Warrenlll994l ). This simulation tracks the evolu- 
tion of 384^ particles with 6.67 x lO^^M© in a box of side 
length 768 Mpc h~^, and is referred to as the "higher 
resolution" simulation in what follows. Both simula- 
tions have cosmological parameters il,„ = 0.3, J^a = 0.7, 
h = 0.7 and as = 0.9. 

In addition to the galaxy list we have a list of dark mat- 
ter halos, defined using a sph erical over-density cluster 
finder (e.g. fEvrard et al.ll2002D . By running the cluster 
finding algorithm on the mock catalogs, we connect clus- 
ters detected "observationally" from their galaxy content 
with simulated dark matter halos in a direct way. 

2.3. Velocity Bias 

Given that there is still substantial uncertainty in the 
amount of velocity bias for various galaxy samples, we 
must be careful to avoid velocity bias dependent conclu- 
sions. The mock catalogs with which we are comparing 
do not explicitly include velocity bias. Fortunately, we 
will only incur errors due to velocity bias when we es- 
timate masses or directly compare velocity dispersions 
measured in the mock catalogs to those measured in the 
data. Therefore in most of our analysis, velocity bias 
has no effect. Where it is relevant, we choose to leave 
velocity bias as a free parameter because of its current 
observational and theoretical uncertainties. 

Observational uncertainties in velocity bias arise sim- 
ply because it is exceedingly difficult to measure. To 
make such a measurement, one usually requires two in- 
dependent determinations of mass, one of them based on 
dynamical measure ments, each subj ect to systematic and 
random errors fe.g. ICarlberdll994l) . Ano ther technique 
was recently used bv iRines et al.l ()2006[ ). namely con- 
straining the velocity bias by measuring the virial mass 
function and comparing it to other independent cosmo- 
logical constraints. Their analysis resulted in a bias of 
by ~ 1.1 — 1.3. Unfortunately, this technique folds in 
systematic errors from the other analyses. 

In the past, theoretical predictions of velocity bias 
were affect ed by numerical over-merging and lo w reso- 
lution (e.g. iFrenk et al.lll996l: iGhigna et al.ll2000D . Most 
estimates of velocity bias based on high-resolution 
N-body simulations have given 6„ ^ 1.0 — 1.3 
jColfn et al.' '2000'; 'Ghigna etiD 120001 : iDiemand etHI 
[2004; Faltcnbacher ct al. 200l), partially depending on 
the mass regime studied. Recent theoretical work has 
shown that differing methods of subhalo selection in N- 
body simulations change the derived velocity bias. In 



particular, iFaltenbacher fc Diemandl ()2006[ ) have shown 
that when subhalos are selected by their properties 
at the time of accretion onto their hosts (a model 
which also matches the two-point clustering better, see 
IConrov et ani2006l ). they are consistent with being un- 
biased with respect to the dark matter. Still, under- 
standing velocity bias with confidence will require more 
observational and theoretical study. As a result, we leave 
velocity bias as a free parameter where assumptions are 
required. 

3. THE MAXBCG CLUSTER CATALOG 
3.1. The maxBCG Cluster Detection Algorithm 

The maxBCG cluster detection algorithm identifies 
clusters as significant over -densities in position-color 
space ()Koester et al.l l2007al lb[). It relies on the fact 
that massive clusters are dominated by bright, red, 
passively-evolving ellip ticals, known as the red-sequence 
(jGladders &: YedT2000( ). In addition, it exploits the spa- 
tial clustering of red-sequence and the presence of a cD- 
like brightest cluster galaxy (BCG). The brightest of the 
red-sequence galaxi es form a color-m agnitude relation, 
the E/SO ridgehne (jAnnis et al .11 19991 ). whose color is a 
strong function of redshift. Thus, in addition to reliably 
detecting clusters, maxBCG also returns accurate pho- 
tometric redshifts^^lie details of the algorithm can be 
found in lKoester et all (|2007bf ). 

The primary parameter returned by the maxBCG clus- 
ter detection algorithm is N^J^°, the number of E/SO 
ridgeline galaxies dimmer than the BCG, within +/- 0.02 
in redshift (as estimated by the algorithm), and within 
a scale radius i?200 (|Hansen et al.ll200"5l ): 

R% = {UOh-'kpc)xN''^f (1) 

where Ngai is the number of E/SO ridgeline galaxies 
dimmer than the BCG, within +/- 0.02 in redshift, and 
within 1 /i^^Mpc. 

The value of is defined by iHansen et all (|2005l ) 
as the radius at which the galaxy number density of the 
cluster is 200f2~^ times the mean galaxy space density. 
This radius may not be physically equivalent to the stan- 
dard R200 defined as the radius in which the total matter 
density of the cluster is 200 times the critical density. 

In the work below, we also use the results of Sheldon et 
al. (2007, in preparation) and Johnston et al. (2007, in 
preparation) who calculate R200 from w eak lensing anal - 
ysis on stacked maxBCG clusters. iJohnston et al.l ( 20071 ) 
show that these weak lensing measurements can be non- 
parametrically inverted to obtain three-dimensional, av- 
erage mass profiles. In the context of the halo model, 
these mass profiles are fit wi th a one- an d two-halo term. 
The best fit NEW profile (jNavarro et al.iil997t ). which 
comprises the one- halo term, is used to measure R200. 
Several systematic errors are accounted for including 
non-linear shear, cluster mis-centering, and the contri- 
bution of the BCG light (modeled as a point mass). 

It is notable that the redshift estimates for the 
maxBCG cluster sample are quite good. They can be 
tested with SDSS data by comparing them to spectro- 
scopic redshifts for a large number of BCG galaxies ob- 
tained as part of the SDSS itself. The photometric red- 
shift errors are a function of cluster richness, varying 
from (5z — 0.02 f or systems of a few ga laxies to 5z < 0.01 
for rich clusters (|Koester et al.ll2007a[ ). 
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iKoester et all ()2007allbl ) estimate the completeness 
(fraction of real dark matter halos identified) as a func- 
tion of halo mass and purity (fraction of clusters iden- 
tified which are real dark matter halos) as a function of 
^ga? running the detection algorithm defined above on 
the ADDGALS simulations. The maxBCG cluster cat- 
alog is demonstrated to have a completeness of greater 
than 90% for dark matter halos above a mass of 2 x 10^'' 
Mq, and a purity of greater than 90% for detected clus- 
ters with observed richness greater than N^q?=10. The 
selection function has been fu rther charac t erized for use 
in cosmological constraints bv lRozo et al.l ()2007aD . 

We finally note that clusters at lower redshift are more 
easily identified dire ctly from the spec troscopic sample 
(e.g. the C4 catalo g, iMiller et al.ll2005l or the catalog of 
iBerlind et ani2006f) . but are limited in number due to 
the high flux limit of the spectroscopic sample. Clusters 
at redshifts higher than 0.3 can be identified easily in 
SDSS photometric data, but measurement of their rich- 
nesses, locations, and redshifts in a uniform way becomes 
increasingly difficult as their member galaxies become 
faint. Future studies similar to the one described in this 
paper will be possible as the maxBCG method is pushed 
to higher redshift and higher redshift spectroscopy is ob- 
tained. 



3.2. The Cluster Catalog 

The pubHshed catalog (|Koester et al.|[2007aD includes 
a total of 13,823 clusters from ~ 7500 square degrees 
of the SDSS, with 0.1 < z < 0.3 and richnesses greater 
than N^°^=10. For this study, we extend the range of this 

catalog to 0.05 < z < 0.31 and Nf°i> 3. The lower red- 
shift bound allows us to include more of the SDSS spec- 
troscopy, which peaks in density around z ~ 0.1. The 
extended catalog used in this study sacrifices the well- 
understood selection function of the maxBCG clusters 
for the extra spectroscopic coverage and thus improved 
statistics. The lower richness cut additionally cut allows 
us to probe a wider range of cluster and group masses. 
This larger sample has a total of 195,414 clusters and 
groups. 

The sele ction function has on ly bee n very well charac - 
terized fbv IKoester et al .112007311^1 and lRozo et"aIll2007aD 
for the maxBCG catalog presented in IKoester et all 
llOOTi). The broader redshift range and lower richness 
limit considered for this study are not encompassed in the 
preceding studies. This is primarily because we expect 
the color selection may produce less complete samples for 
low richness; since the red fraction in clusters and groups 
decreases with decreasing mass, maxBCG may be biased 
against the bluest low mass groups. 

Requiring sufficient spectroscopic coverage for each 
cluster, defined in §4.11 in the context of the construc- 
tion of the BCG-galaxy velocity correlation function, 
significantly restricts the sample of clusters studied here 
due to the limited spectroscopic coverage of the SDSS 
in comparison with its photometric coverage. Most of 
the maxBCG clusters above z « 0.2 contribute rela- 
tively little to the BGVCF. The final cluster sample in- 
cludes only 12,253 clusters. A total of 57,298 of the more 
than 640,000 SDSS DR5 galaxy redshifts are used in this 
study. 



4. THE VELOCITY DISPERSION-RICHNESS RELATION 

To compare cluster catalogs derived from data to theo- 
retical predictions of the cluster mass function, we must 
examine cluster observables which are related to mass. 
For individual clusters, the primary mass indicators we 
have for this photometrically-selected catalog are based 
on observations of galaxy content. Some of the observ- 
able parameters include Nga/, total optical luminosity 
Lopt, and comparable parameters measured within ob- 
servationally scaled radii N^°; and L'^p^. To understand 
the relationship between these various richness measures 
and cluster mass, we can refer to several observables more 
directly connected to mass: the dynamics of galaxies, X- 
ray emission, and weak lensing distortions the clusters 
produce in the images of background galaxies. In this 
work we concentrate on the extraction of dynamical in- 
formation from the maxBCG cluster catalog. Weak lens- 
ing measurements of this cluster catalog are described by 
Sheldon et al. (2007, in preparation) and Johnston et al. 
(2007, in preparation). An analysis of the average X-ray 
emission by maxBCG clusters is in preparation (Rykoff 
et al. 2007). Preliminary cosmological constraints from 
this catalog, based only on clu ster counts, have been 
presented bv lRozo et al l ()2007bD : these will be extended 
with the additon of these various mass estimators. 

4.1. Extracting Dynamical Information from Clusters: 
the BCC-Calaxy Velocity Correlation Function 

Using the SDSS spectroscopic catalog, we can learn 
about the dynamics of the maxBCG galaxy clusters. For 
this sample, drawn from a redshift range from 0.05 to 
0.31, the spectroscopic coverage of cluster members is 
generally too sparse to allow for direct measurement of 
individual cluster velocity dispersions. We instead focus 
here on the measurement of the mean motions of galax- 
ies as a function of cluster richness. We study these mo- 
tions by first constructing the BCG-galaxy velocity cor- 
relation function, ^{Sv,r, Pd, Pgai), hereafter, the BCG- 
galaxy velocity correlation function, BGVCF. 

To construct the BGVCF, we identify those clusters for 
which a BCG spectroscopic redshift has been measured. 
We then search for other galaxies with spectroscopic red- 
shifts contained within a cylinder in redshift-projected 
separation space which is ±7, 000 km s~^deep and has 
a radius of one R200 , which varies as a function of N^^^ , 
as measured by Johnston et al. (2007, in preparation). 
For each such spectroscopic neighbor we form a "pair" , 
recording the velocity separation of the pair, 6v, their 
projected separation at the BCG redshift, r, informa- 
tion about the properties of each galaxy (the BCG and 
its neighbor) Pgai, and information about the cluster in 
which the BCG resides Pd ■ This pair structure contains 
the observational information relevant to the BGVCF. 

The quantities Pgai and Pd will change depending on 
the context in which we are considering the BGVCF. 
Some examples of Pd include Ngai, Lopt, N^°?, L^oo^ Jq, 
cal environmental density, and R2oo- Examples of Pgai 
include the magnitude differences between members and 
the BCG, BCG i-band luminosity, and stellar velocity 
dispersion. The mean of Sv is consistent with zero so 
that the BGVCF is independent of the parity of Sv. In 
Figure [1] we show the BGVCF of the catalog in two bins 
of Nf°, one with Nf^ < 5 (left panel) and one with Nf° 
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Fig. 1. — The projected separation and velocity separation for 
the pairs of galaxies in clusters with N^^j > 15 (right) and N^^'J 

< 5 (left). There is a clear change in the BGVCF with N^OO. 

> 15 (right panel). The structure of the BGVCF clearly 
changes with richness. 

When we stack clusters to measure their velocity dis- 
persion as described below, the statistical properties of 
our sampling of the BGVCF determine the errors in 
our measurements. Figures [2l[a)-(d) show the number 
of pairs in the BGVCF per cluster as a function of N^°j 
plotted for the entire BGVCF and in three redshift bins. 
As the redshift of the bins increases, we can see that 
the number of spectroscopic pairs becomes less reflective 
of the value of N^2° for the cluster. Clusters at lower 
redshift tend to have more pairs, as expected. Figured] 
shows that if we want to measure the velocity dispersion 
of individual clusters, we are limited to low redshift and 
high richness because only these clusters are sufficiently 
well sampled by the SDSS spectroscopic data. 

4.2. Characterizing the BGVCF of Stacked Clusters 

In this work, we are primarily concerned with the mag- 
nitude of the velocity dispersion and its scatter at fixed 
richness, as well as its dependence on the properties of 
clusters and their galaxies. To greatly simplify our analy- 
sis, we now integrate the BGVCF radially to produce the 
pairwise velocity difference histogram (P VD histogram) . 
Strictly speaking, we do not produce a true PVD his- 
togram because the only pairs we consider are those be- 
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Fig. 2. — The number of spectroscopic pairs in the BGVCF per 
cluster as a function of redshift. The redshift ranges for each panel 
are indicated; the first panel includes the entire catalog. Clusters 
with lower redshifts and higher richness are better sampled by the 
spectroscopic survey. The points in this diagram are displaced 
randomly from their integral values (i.e. {1, 2, ...}) so that the true 
density can be seen. 



tween BCGs and non-BCGs around the same cluster (i.e. 
all other galaxies in the BGVCF around each cluster). 
We do not include non-BCG to non-BCG pairs. 

Ideally, if every cluster had a properly selected BCG 
and all BCGs were at rest with respect to the center-of- 
mass of the cluster, our measurements of the mean veloc- 
ity dispersion would be unbiased with respect to the true 
center-of-mass velocity dispersion. Unfortunately, these 
simplifying assumptions are not likely to be true. In 
particular it has been found that BCGs move on average 
with ~ 25% of the cluster's velocity dispersion, but that 
at hi gher mass BCG movement becomes more significant 
(e.g. Ivan den Bosch et al.ll2005l ). In §5.2.11 we show that 
a correction must be applied to our mean velocity dis- 
persions due to centering on the BCG (hereafter called 
BCG bias), but that we cannot distinguish between im- 
properly selected BCGs and BCG movement. However, 
we will still focus on the BCG in the measurements of the 
BGVCF because it is a natural center for the cluster in 
the context of the maxBCG cluster detection algorithm. 

Having decided to concentrate on the PVD histogram, 
we now motivate the construction of a fitting algo- 
rithm for th e PVD histogram. Previ ous work by 
iMcKav et aT] (I2002D iPrada et all (I2003D. and others 
(l]3raincrd & Specian' '2003 t Ivan den Bosch et all 120041 : 
[Conroy et al. 2005, 200j) has focused on measuring the 
halo mass of isolated galaxies b y using dynamical mea- 
surements. iMcKav et all ()2002[) found the velocity dis- 
persion around these galaxies by stacking them in lumi- 
nosity bins and fitting a Gaussian curve plus a constant, 
representing the constant interloper background, to the 
stacked PVD histogram. In this method, the standard 
deviation of the fit Gaussian curve is then taken as an 
estimate of the mean value of the velocity dispersion of 
the stacked groups. 

The algorithm presented above is insufficient for our 
purposes for the following reason. The PVD histogram of 
stacked galaxy clusters is shown in Figure [31 it is clearly 
non-Gaussian. Although the width of a single Gaussian 
curve likely still provides some information about the 
typical dispersion of the sample, it cannot adequately 
capture the information contained in the non-Gaussian 
shape of the PVD histogram. As we will show below 
in ^ although the PVD histogram for a stack of similar 
velocity dispersion clusters is expected to be nearly Gaus- 
sian, there are multiple sources of non-Gaussianity that 
can contribute to the non-Gaussian shape of the stacked 
PVD histograms. To adequately characterize this non- 
Gaussianity, one of the primary goals of this paper, we 
must use a better fitting algorithm to characterize the 
PVD histograms. 

In this work we will mention a variety of different meth- 
ods of fitting the PVD histograms. We give their names 
and definitions here and follow with a full description of 
the primary method used, 2GAUSS. The various meth- 
ods arc: 

IGAUSS: This is the method used for isolated galax- 
ies as discussed above. We do not use it because, 
as described in ii4.2.2i it systematically underesti- 
mates the second moment of the PVD histogram 
by - 8%. 

2GAUSS: This method is the one motivated and de- 
scribed in detail below. It is the primary method 
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Fig. 3. — PVD histograms in four N^^j bins. The EM algorithm fits are shown along with the Poisson errors of the histograms. Notice 
the change in the degree of non-Gaussianity of the lower N^^'J bins compared to the higher N^^^ bins. As richness increases, the stacked 
PVD histogram becomes more Gaussian. Section \5\ shows that this decrease indicates a decrease the width of the lognormal distribution 
of velocity dispersions in each bin. The deviations of the fits near the centers of the distributions are only at the two-sigma level and are 
highly dependent on the bin size used to produce the cluster-weighted PVD histograms. 



used throughout the rest of the paper. Simply, it 
fits the PVD histogram with two Gaussians and 
a constant background term, but with a special 
weighting by cluster and not by galaxy (see ^J5]). 

NGAUSS: This is a generalized version of the 2GAUSS 
method with N Gaussians instead of two (i.e. a 
three Gaussian fit will be denoted by 3GAUSS). 
Although it fits the PVD histogram as well as the 
2GAUSS method, it is more computationally ex- 
pensive, and adds parameter degeneracies without 
substantially improving the quality of the fit. 

NONPAR: There are several possible methods for us- 
ing non-parametric fits to the PVD histogram (e.g. 
kernel density estimators). We do not use them 
because they do not naturally account for the 
constant interloper background in the PVD his- 
togram. For a go o d rev iew of these techniques see 
iWasserman et al.l (|2Q01[ ). 

BISIGMA: This method is not used for the PVD his- 
tograms of stacked clusters, but is used for the PVD 
histograms of individual clusters. The bi weight is 
a robust estimator of the standard deviation that 
is appropriate for u se with samples of points which 
contain interlopers (jBeers et al.lll99(3 l. See §4.31 for 
a description of its use in this paper. 

BAYMIX: This method is a Bayesian or maximum like- 
lihood method that can be used in the context of 



the model of the scatter in velocity dispersion at 
fixed richness. This method will be described fully 
in fjsl but we do not use it in this paper because 
we have found it to be unstable. 

We will refer to these methods by their names given 
above. Although we mention these other methods, 
for deriving the main results of the paper we use the 
2GAUSS method for stacked cluster samples (|4.2.ip and 
the BISIGMA method for individual clusters (E31). 

4.2.1. Fitting the PVD Histogram 

In order to more fully capture the shape of the PVD 
histogram of stacked clusters, avoid systematic fitting er- 
rors, and avoid fitting degeneracies, we would ideally use 
the NONPAR method to fit the PVD histogram. In this 
way we would impose no particular form on the PVD his- 
togram, allowing us to extract its true shape with as few 
assumptions as possible. However, this method does not 
naturally account for the interloper background term of 
the PVD h istogram which can be easily fit by a constant 
(jWoitak et al. 2006). 

In the pursuit of simplicity, we compromise by fitting 
the PVD histogram of stacked clusters with two Gaus- 
sian curves plus a constant background term. The means 
of the two Gaussians are free parameters but are fixed to 
be equal. In all cases the mean is consistent with zero. 
The two Gaussian curves allow us to more fully capture 
the shape of the PVD histogram, while still accounting 
for the interloper background of the BGVCF with the 
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constant term. It could be that the shape of the PVD 
histogram cannot be satisfactorily accounted for by two 
Gaussians. We show in ^4.2.2l that two Gaussians are suf- 
ficient to describe the shape of PVD histogram. Using 
the 2GAUSS method instead of the NGAUSS method 
avoids expensive computations and limits the number of 
parameters in the fitting procedure to six, avoiding de- 
generacies in the fit parameters due to limited statistics. 

In the interest of fitting stability and ease of use (but 
sacrificing speed), we use the expectation maximiza- 
tion algorithm (EM) for one dime nsional Gaussian mix- 
tures to fit the PVD histogram (jDempster et al.l 119771 : 
IConnollv et"an l2000f ). In Appendix El we re-derive 
the EM algorithm for one dimensional Gaussian mix- 
tures such that it assigns every Gaussian the same mean 
and weights groups of galaxies, not individual galaxies, 
evenly. This last step is important in the context of the 
model of the distribution of velocity dispersions at fixed 
^ga? discussed in SJSJ To account for our velocity errors, 
we use the results of IConnollv et all (|200(y ) and subtract 
in quadrature the 30 km s~^redshift error from the stan- 
dard deviation of each fit Gaussian. 

We have described our measurements in the context 
of the PVD histogram and not the BGVCF. However, 
these two view points in o ur case are completely equiva- 
lent. iWoitak et al.l (|2006ij ) have shown that galaxies un- 
correlated with the cluster in PVD histograms (i.e. in- 
terlopers) form a constant background. Thus by fitting 
a constant term to the PVD histogram, we are in effect 
subtracting out the uncorrelated pairs statistically to re- 
tain the BGVCF. 

4.2.2. Tests of the 2GAUSS Fitting Algorithm 

In order to measure the moments of the PVD his- 
togram as a function of richness, the data is first binned 
logarithmically in ^f^i and then the 2GAUSS method is 
applied to each bin. The results of our fitting on four bins 
of NgJJ° are shown in Figure [31 In all cases, the model 
provides a reasonable fit to the data. We defer a full 
discussion of the fitting results to [JS] where we show how 
to compute the mean velocity dispersion and scatter in 
velocity dispersion at fixed N^°^ using the results of the 
2GAUSS fitting algorithm, including corrections for im- 
properly selected BCG centers and/or BCG movement. 

To ensure that the use of the 2GAUSS method does 
not bias our fits in any way, we repeated them using 
the IGAUSS, 3GAUSS, and 4GAUSS methods. We find 
that while the measured second moment for the IGAUSS 
fits are consistently lower than those measured from the 
2GAUSS fits by approximately 8%, both the second and 
fourth moments measured by the 2GAUSS, 3GAUSS, 
and 4GAUSS fits are the same to within a few percent. 
Therefore we conclude that the fits have converged and 
that two Gaussians plus a constant are sufficient to cap- 
ture the overall shape of the PVD histogram. The fitting 
errors are determined using bootstrap resampling over 
the clusters in each bin. 

The results are not dependent on the radial or velocity 
scale used to construct the BGVCF and thus the PVD 
histogram. We repeated the 2GAUSS fits using 0.75R200: 
I.OR200, 1-25R200, and I.5R200 projected radial cuts as 
well as ±10000 km s^^, ±5ct scaled, and ±10cr scaled 
apertures in velocity space. We found no significant dif- 
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Fig. 4. — The individual cluster velocity dispersions for the en- 
tire data set. The dispersions were computed using the BISIGMA 
method on only those clusters with more than ten BGVCF pairs 
within three-sigma of the BCG (calculated using the mean velocity 
dispersion at the BCG's N^^j). The dashed line is the geometric 
mean of the velocity dispersions calculated in the context of the 
mixing model (fjSj without a correction for BCG bias and the solid 
line is the fit to all velocity dispersions of the individual clusters. 
The error bars on the ICVDs are not shown for clarity. The two 
relations (dashed and solid) agree with each other within one- to 
two-sigma, but the average bias between them is a real effect. 

ferences in the fits using each of the various cuts, with 
the exception of the value of the background normaliza- 
tion, which will change when a larger aperture allows 
more background to be included in the PVD histogram. 
The scaled apertures were made by first determining the 
relation in a fixed aperture, and then rescaling the aper- 
ture according to this relation. For example, in a bin of 
Ng°; from 18 to 20, the velocity dispersion is ~ 500 km 
s~^as measured in a ±7000 km s^^fixed aperture. To 
make the five sigma scaled aperture measurements, we 
used an aperture in this N^"° bin of ±5 x 500 = ±2500 

km s^^. 

4.3. Estimating Individual Cluster Velocity Dispersions 

To measure the velocity dispersion of individual clus- 
ters in the SDSS, we select all clusters that have at least 
ten redshifts in its PVD histogram within three sigma 
measured by the mean velocity-dispersion-N^^^ relation 
calculated in fjsl Of the 12,253 clusters represented in the 
BGVCF, only 634 meet the above requirement. We then 
apply the BISIGMA method to calculate the velocity dis- 
persi on which uses the biweight estimator (jBeers et al.l 
'1990). The resulting velocity dispersions are plotted in 
Figure m The BCG bias manifests itself here in that the 
ICVDs show a downward bias with respect to the mean 
relation calculated in ^ but not corrected for BCG bias. 
We will correct for this bias in t ^5.2.1l The two relations 
do however agree to within one- to two-sigma (computed 
through jackknife resampling with the biweight). Using 
these individual cluster velocity dispersions (ICVDs), we 
can directly compute the scatter in the velocity disper- 
sion at fixed N^^°. We will compare this computation 
with the estimate based on measuring non-Gaussianity 
in the stacked sample in fj5l 

5. MEASURING SCATTER IN THE VELOCITY 
DISPERSION-RICHNESS RELATION 

5.1. Mass Mixing Model 

The subject of non-Gaussianity of pair-wise velocity 
difference histograms has been debated extensively in 
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the literature. iDiaferio fc Gelleil (|1996f ) have shown that 
non-Gaussianity in the total PVD histogram for dark 
matter halos arises from two sources: stacking halos of 
different masses according to the mass function, and in- 
trinsic non-Gaussianity in the PVD histogram due to 
substructure, secondary infall, and dissipation of orbital 
kinetic energy into subhalo internal degrees of freedom. 
However for galaxy clusters, they conclude that the PVD 
histogram of an individual galaxy cluster that is virial- 
ized is well approximated by a Gaussian. 

She th (1996) independently reached the same conclu- 
sions but did not consider any intrinsic non-Gaussianity, 
just the effect of stacking halo s of different mas s es ac - 
cording to the mass function. iSheth fc Diaferiol (jOOlh 
gave a more complete synthesis of non-Gaussianity in 
PVD histograms, generalizing the formalism to include 
the effect of particle tracer type (halos versus galaxies 
versus dark matter particles) and extensively considering 
the effects of local environment. In all three treatments, 
non-Gaussianity is shown to arise from the stacking of in- 
dividual PVD histograms which are Gaussian or nearly 
Gaussian and have some intrinsic distribution of widths. 
In this paper we use the term "mass mixing" to refer 
only to non-Gaussianity arising through this process. 

It should be noted that observing non-Gaussianity in 
PVD histograms stacked by richness is equivalent to 
saying that the richness verses velocity dispersion re- 
lation has intrinsic scatter (assuming that the stacked 
PVD histogram of set of similar velocity dispersion clus- 
ters has intrinsic Gaussianity) . Intrinsic scatter in the 
vel ocity-dispersion-rich ness relation was reported earlier 
by iMazure et al.l (|1996f) for a volume-limited sample of 
80 literature-selected clusters with at least 10 redshifts 
each. Here we seek to quantify this scatter as a func- 
tion of richness by measuring the non-Gaussianity in the 
PVD histogram. 

For individu a,l galaxy clu s ters, t heoretical work has 
been done by llguchi et al.l (|2005D showing that vio- 
lent gravitational collapse in an N-body system may 
lead to a n on- Gaussian velocity distribution. However, 
iFaltenbcich cr & Dicmand (2006) have shown that the ve- 
locity distribution of subhalos in a dissipationless N-body 
simulation is Gaussian (Maxwellian in three dimensions) 
and shows little bias compared to the diffuse dark mat- 
ter, if the subhalos are selected by their mass when they 
enter the host halo and not t he present-day mass. Fi- 
nally, ISheth fc Diaferiol ()2001h caution against conclud- 
ing that the three-dimensional velocity distribution of a 
galaxy cluster is Maxwellian even if one component is 
found to be approximately Gaussian. They show that 
for a slightly non-Maxwellian three-dimensional distri- 
bution, departures of the one-dimensional distribution 
from a Gaussian are much smaller than departures of 
the three-dimensional distribution from a Maxwellian. 

Observationally, the PVD histogram of individual 
galaxy clusters has been shown to be non-Gaussian in 
the presence of sub s tructure (e.g. Cortese et al.l l2004t 
Hallidav et all I20QJ; iGirardi et all l2005| ). Conversely, 
Girardi et al. (199^ observed 79 galaxy clusters with at 



least 30 redshifts each and found no systematic devia- 
tions from Gaussianity (although 14 were fou nd to be 
mildly non-Gaussian at the three-sigma level). I Caldwell 
([1987, ) showed that once recently-accreted galaxies are 
removed from the sample of redshifts from the Fornax 



cluster, the PVD histogram becomes Gaussian. 

Unfortunately, due to the low number of galaxy red- 
shifts available for a given cluster as shown in Hi.ll we 
must make some assumption about the shape of the PVD 
histogram for a set of stacked clusters of velocity disper- 
sion between a and a + da in order to proceed with con- 
structing a mass mixing model. If every cluster were 
sampled sufficiently, the scatter in velocity dispersion 
at a given value of N^^j could be directly computed by 
measuring the velocity dispersions of individual clusters. 
Since this is not the case, in order to proceed we assume 
that for a set of stacked clusters of velocity dispersion 
between <j and a + da, the PVD histogram is Gaussian. 
This assumption is well justified and is equivalent to the 
assumption that a large enough portion of the clusters 
in our catalog are sufficiently relaxed, virialized systems 
at their centers, so that when we stack them, any asym- 
metries or substructure are averaged out. 

However, we will still be sensitive to substructure 
around the BCG. In fact, we may even be more sensitive 
to substructure around the BCG since we are directly 
stacking clusters on the BCG. Using the ADDGALS 
mock galaxy catalogs, we find that when binning dark 
matter halos with galaxies by both velocity dispersion 
and mass, the resulting stacked PVD histograms are 
Gaussian. This result gives us further confidence that 
the above assumption is reasonable, but it is still possi- 
ble that it is sensitive either to the BCG placement or to 
the galaxy selection of the mock catalogs. 

Under the assumption of Gaussianity of the PVD his- 
togram for a stacked set of similar velocity dispersion 
clusters, the non-Gaussianity in the stacked histograms 
can be entirely attributed to the distribution of the ve- 
locity dispersions (or equivalently mass) of the stacked 
clusters. The goal of our analysis is then to extract the 
distribution of velocity dispersions for a given PVD his- 
togram by measuring its deviation from Gaussianity. 

We can now proceed in two distinct ways. First, by 
writing the PVD histogram as a convolution of a Gaus- 
sian curve of width a for each stacked set of similar veloc- 
ity dispersion clusters, with some distribution of velocity 
dispersions in the stack, we could numerically deconvolve 
this Gaussian out of the PVD histogram to produce the 
distribution of velocity dispersions in the stack. Repeat- 
ing this procedure in various bins on different observ- 
ables, we could then have knowledge of the scatter in 
velocity dispersion as a function of these observables. 

Second, by taking a more model dependent approach, 
we could make an educated guess of the distribution of a 
as a function of a given set of parameters, and then per- 
form the convolution to predict the shape of the PVD 
histogram. By matching the predictions with the obser- 
vations through adjusting the set of parameters, we could 
then have a parameterized model of the entire distribu- 
tion. 

Based on the results shown below in f}6l it is appar- 
ent that the only parameter upon which a varies sig- 
nificantly, neglecting the modest redshift dependence, is 
N^°j . Thus we choose a parametric model that is a func- 
tion of N^^^ only. The dependence of mass mixing on any 
secondary parameters (e.g. the BCG z-band luminosity) 
can then be explored through first binning on N^°° and 
then splitting on these secondary parameters, because 
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their effects are small (see 3S]). 

The ADDGALS mock catalogs show the scatter about 
the mean of the logarithm of the velocity dispersion mea- 
sured from the dark matter for a given value of Ng^z to 
be approximately Gaussian for dark matter halos. Us- 
ing this distribution as our educated guess, we apply this 
model to the data in logarithmic bins of N^JJ° . We avoid 
the deconvolution due to its inherent numerical difficulty. 
Using the mock catalogs, we can test our method of de- 
termining the parameters of this model as a function of 
N^°j by applying our analysis to clusters identified in the 
mocks and then matching those clusters to halos in order 
to determine their true velocity dispersions. 

To summarize, there are two and possibly even three 
sources of non-Gaussiani ty in our PVD histograms (sim- 
ilar to those discussed inlShethI ([19961) .'D iaferio fc Gelled 
([1996), and ISheth fc Diaferiol ([2001) discussed above): 
(1) intrinsic non-Gaussianity in the PVD histogram for 
an individual galaxy cluster, (2) the range of velocity 
dispersions that contribute to the PVD histogram for 
a given value of N^^j (mass mixing), and (3) stacking 

of clusters with different values of N^°9 in the same 
PVD histogram. We handle the last two sources of non- 
Gaussianity jointly through the model below and ignore 
the first, which is expected to be small, both because 
most clusters are relaxed, virialized systems and many 
are stacked together here. 

As a final note, by binning logarithmically in N^JiJ^ and 
then measuring the mass mixing, we only approximately 
account for non-Gaussianity arising from clusters with 
different richnesses in the same bin. By avoiding binning 
all together and finding the model parameters through 
a maximum likelihood approach, we could remedy this 
issue. This approach is the BAYMIX method. However, 
we have found this process to be computationally expen- 
sive and slightly unstable due the integral in equation [3| 
below. Its only true advantage is in the computation of 
the errors in the parameters and their covariances. Us- 
ing a maximum likelihood approach, one could calculate 
the full covariance matrix of the parameters introduced 
below. As will be shown below, binning in N^°; and then 
measuring the mass mixing will allow us to only easily 
find the covariance matrices of the parameters in sets of 
two. Since we are not significantly concerned with the 
exact form of these errors or the covariances of the pa- 
rameters, we choose to bin for simplicity. 

Future analysis of this sort with PVD histograms will 
hopefully take a less model-dependent approach by de- 
convolving a Gaussian directly from the stacked PVD 
histogram. This will allow for a direct confirmation of 
the distribution of velocity dispersions at fixed richness. 
Also, a direct deconvolution would allow one to make 
mass mixing measurements of stacks of clusters binned 
on any observable. Although the lognormal form as- 
sumed here may in fact have wider applicability, we can 
only confirm its use for clusters stacked by richness. 

5.2. Results Using The Mass Mixing Model 

We can write the shape of the non-background part of 
the stacked cluster- weighted PVD histogram, P(u), as 



P{v) — I p{v, a)d<7 



p{v\cr)p{a)da 



(2) 



where v is the velocity separation value and a is the 
Gaussian width of a stacked set of similar velocity dis- 
persion clusters. Using the assumptions from the previ- 
ous discussion, we let p{v\a) be a Gaussian of width a 
with mean zero, and p(cr) be a lognormal distribution. 
Performing the convolution, we get that P{v) is given 

by 



Piv) 



1 



exp 



2cr 



252 



where < In tr > is the geometric mean of a and S is the 
standard deviation of Intr. We note that the quantity 
100 X 5 is the percent scatter in a. The second and 
fourth moments of this PVD distribution, ^(2) and /X(4) 
are given by 



^(2) = exp (2 < In (T > +25^ 



(4) 



and 



Ai(4) = 3 exp (4 < In cr> +85"^) . (5) 
For convenience we define the normalized kurtosis to be 



7^ 



M(4) 



= exp 45"^ 



(6) 



(2) 



Note that the odd moments of this distribution are ex- 
pected to vanish, and in fact the data is consistent with 
both the first and third moments being zero. Equation 
[4| shows us why the velocity dispersions derived directly 
from the second moment of the PVD histogram must be 
corrected. The factor of exp artificially increases the 
velocity dispersions. In practice this effect is at most 
^ 20% at low richness and declines to ^ 5% for the most 
massive clusters in our sample. 

To complete our model, we need a term corresponding 
to the background of the PVD histogram. This back- 
ground has two parts, an uncorrelated interloper compo- 
nent and an infall component (i.e. galaxies which are not 
in virial equilibrium but are bound to the cluster in the 
infall region). IWoitak et al.l (|2006f ) have shown that the 
uncorrelated b ackground is a constant in the PVD his- 
togram, while Ivan den Bosch et al.l ([2004D have shown 
that the infall component is not constant and forms a 
wider width component for PVD histograms around iso- 
lated galaxies. We ignore the possible infall components 
in our PVD histograms but note that they may bias the 
widths of our lognormal distributions high. Investigation 
of this issue in the mock cata logs shows that the result 
of Ivan den Bosch et al.l ([2004[ ) holds for galaxy clusters 
as well. Although we do not explore this here, it may be 
possible to reduce the mass mixing signal from infalling 
galaxies by selecting galaxies by color (i.e. red galaxies 
only or just maxBCG cluster members), which preferen- 
tially selects galaxies near the centers of the clusters. 

Accounting for the constant interloper background, the 
full model of the cluster- weighted PVD histogram, 'P{v), 
can now be written as, 



Viv) 



P_ 
2L 



il-p)P{v) 



(7) 



where L is the maximum allowed separation in velocity 
between the BCG and the cluster members, set to 7000 
km s^^, and p is a weighting factor that sets the back- 
ground level in the PVD histogram. Here we ignore the 
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small error in the normalization due to integrating V{v) 
over V from — cxi to oo instead of —L to L. As long as 
L is sufficiently large, say on the order of 4exp(<lncr>) 
for a given PVD histogram, this error is small. 

Now, it can be seen why the stacked PVD histogram 
must be weighted by cluster instead of by galaxy. In 
equation [31 equal weight is assigned to each cluster be- 
cause p{v\a) is a Gaussian normalized to integrate to 
unity. In order to predict the pair-weighted PVD his- 
togram correctly, we would have to predict the total 
number of BGVCF pairs, both cluster and background, 
as function of N^JJ° and include this total in the integral 
and the background term. (The factors of p and 1 — p 
take care of the relative weighting of the background rel- 
ative to cluster, assuming this weighting is the same for 
every cluster. This may not be true, in which case the 
factors of p and I — p would have to be included in the 
integral as well.) This is a significant problem due to 
its dependence on redshift, local environment, and the 
selection function of the survey. 

By using the cluster-weighted PVD histogram fit by 
the EM algorithm derived in Appendix |^ (i.e. the 
2GAUSS method), we can avoid this issue. This weight- 
ing could be included in the BAYMIX method as well. 
We choose to use the 2GAUSS method because it is more 
stable and less computationally expensive. In practice 
the extra weighting factors do not change our results 
drastically, indicating that most clusters already get ap- 
proximately equivalent weight even in the pair-weighted 
PVD histogram. However, for completeness we include 
the weighting factor. Note that two Gaussians is the the 
fewest number of Gaussians a distribution could be com- 
posed of and have a normalized kurtosis different from 
unity (the normalized kurtosis of a single Gaussian dis- 
tribution is unity). According to equation [6l then, if we 
measure a normalized kurtosis of unity for any of our 
bins, mass mixing in that bin (i.e. S) will be zero. 

The quantities < In cr > , 5^ , and p are measured by bin- 
ning the data in NgJJ° and applying the 2GAUSS method 
as described earlier. This method outputs the constant 
background level p automatically. The normalized kur- 
tosis is calculated as 



TABLE 1 

MaxBCG mass mixing model fit parameters. 



^2 ^ Pl(c^l)'* +^2(^2)"^ 

"""^ (pi(cri)^ -|-P2(a'2)^)^ 



(8) 



(9) 



and the second moment is calculated as 

^ ' P1+P2 

where {pi,P2} and {(Ti,a'2} are the normalizations and 
standard deviations calculated for the two Gaussians in 
the 2GAUSS PVD histogram fit. See Appendix E] for 
more details. Although equation [8] is not properly nor- 
malized, we find that the bias correction computed in 
Appendix IB] is small, and thus equation [5] is a good esti- 
mator of the normalized kurtosis. 

Using equations HI El [HI and [51 we solve for the pa- 
rameters < Incr > and S^. The background normaliza- 
tion p and the normalization and scatter in the velocity- 
dispersion-richness relation are all modeled as power 
laws, which provides a good description of the relations 
in both the data and the simulations: 



Parameter 


Value 


mean-normalization, A 


6.17 ± 0.04 


mean-slope, B 


0.436 ±0.015 


scatter-normalization, C 


0.096 ±0.014 


scatter-slope, D 


-0.0241 ± 0.0050 


background-normalization, E 


-0.980 ±0.052 


background-slope, F 


-0.00154 ±0.00018 



hip = E - 



-i?lnN^°°/25. 
Fexp (<lno->) 



(11) 
(12) 



<lncr>= yl + BlnN^^;'/25 



(10) 



The fit of the measured parameters < In cr > and 5^ to 
the above relations for the maxBCG cluster sample are 
shown in Figure[5l The parameters A, B, C, D, E, and F 
are given in Table[T]and the mass mixing model values for 
each bin are given in Table [2l The mean relation plotted 
in this figure is corrected for BCG bias due to improp- 
erly selected BCGs and/or BCG movement as discussed 
in !j5.2.1l The errors for our data points are derived from 
the bootstrap errors employed in the 2GAUSS method. 
Note that we only have knowledge of the full error dis- 
tributions of the parameters in sets of two, and are thus 
neglecting covariance between, for example, parameters 
A and C or B and C, etc. The BAYMIX method would 
allow for each of the three relations to be fit simultane- 
ously, giving full covariances between the parameters. 

5.2.1. BCG Bias m the 2GAUSS Fitting Algorithm 

Despite that fact that the two mean relations plotted 
in Figure [H agree within one- to two-sigma, we show in 
this section that the bias between the two relations has 
significance and arises from two sources. The first source 
is intrinsic statistical bias in the 2GAUSS method itself. 
Using Monte Carlo tests as described in Appendix [Bl we 
find that this bias is approximately 3-5% downward and 
has some slight dependence on the number of data points 
used in the 2GAUSS fitting method. The Monte Carlo 
computation of the bias is shown in the middle panel of 
Figure m We call this bias &2G- 

The second source of bias is due to some combination 
of BCG movem ent with respect to the parent halo (see 
Ivan den Bosch et al. 2005 ) and the incorrect selection of 
BCGs by the maxBCG cluster detection algorithm (i.e. 
mis-centering) . We can test for this effect by reconstruct- 
ing the BGVCF around randomly selected cluster mem- 
ber galaxies output from the maxBCG cluster detection 
algorithm. If the BCGs are picked correctly and are at 
rest with respect to their parent halos, then by picking 
a random member galaxy, we should observe the mean 
velocity dispersion increase by ^/2. This calculation as- 
sumes that each stack of similar velocity dispersion clus- 
ters has a Gaussian PVD histogram. This test is per- 
formed in the data in the left panel of Figure [SI We 
see that the random member centered dispersions are in- 
creased above < In cr > for each bin in N^^° , but by less 

than \/2- This result indicates that either or both of 
the situations discussed above is happening. The ratio 
of the random member centered dispersions to < In cr > 
is denoted as r^M- 

We can test the above conclusion by using the ICVDs 
computed in To do this, we calculate the ratio 
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Fig. 5. — The mean velocity dispersion (i.e. exp {<lna>)), percent scatter in cr, and background level p measured for the SDSS data. On 
the lower right, we show the mass mixing model integrated over the entire data set and plotted over the cluster-weighted PVD histogram 
of the entire data set. We did not fit the stacked PVD histogram directly. The reduced chi-square between the data and th e ma ss mixing 
model is 1.31, where we have used a robust, optimal bin size given by Izenma n (1991|). The line fits of equations 1101 [TTI and 1121 are shown 
as solid lines in the lower left, upper left, and upper right panels respectively. Note that we have plotted 100 X 5 in the upper right panel, 

not the linear relation between and In ( A'^^"? ) . 

V gals J 



TABLE 2 

MaxBCG mass mixing parameters by N^"? bin. 

gal 



Mean N'^^j <lncr> (geometric mean velocity dispersion) 100 X S (percent scatter in cr) p (background level) 



3.00 


5.31 ± 0.05 


40.5 ±3.5 


0.276 ± 0.012 


4.00 


5.36 ±0.03 


33.6 ± 2.2 


0.252 ± 0.008 


5.00 


5.42 ±0.03 


35.3 ± 2.3 


0.278 ± 0.013 


6.00 


5.55 ±0.07 


36.1 ± 4.6 


0.266 ± 0.015 


7.00 


5.59 ±0.04 


38.3 ± 2.2 


0.253 ± 0.015 


8.00 


5.81 ±0.05 


26.5 ± 5.2 


0.232 ± 0.015 


9.88 


5.74 ±0.04 


40.0 ± 2.1 


0.237 ±0.010 


14.1 


5.95 ±0.04 


34.5 ± 2.4 


0.210 ±0.011 


18.9 


6.13 ±0.13 


33.7 ± 8.1 


0.209 ± 0.033 


22.7 


6.09 ±0.05 


39.0 ± 2.3 


0.187 ±0.023 


28.7 


6.25 ±0.05 


23.5 ±3.0 


0.149 ±0.015 


35.9 


6.29 ±0.07 


28.9 ±4.9 


0.164 ±0.026 


44.7 


6.47 ±0.09 


20.2 ±3.6 


0.156 ±0.036 


58.4 


6.47 ±0.09 


34.5 ± 2.5 


0.137 ±0.030 


87.8 


6.75 ±0.12 


14.9 ±9.4 


0.072 ± 0.026 



of < Incr > to the geometric average of the ICVDs for 
each bin in N^°j . This ratio is plotted in the right panel 
of Figure [5] and is called rjcv d ■ We can also estimate 
this from the computations described in the previous two 
paragraphs. We compute V262G/''fiM for each bin N^^°; 
this quantity is shown in the right panel of Figure[Sl This 
computation assumes that the biases add linearly in the 
logarithm of the velocity dispersion. 



We find that generally \/2b2GlTRM ~ rjcvD within 
the one-sigma errors. This observation indicates that 
our explanation of the bias observed in Figure U] is self- 
consistent. To correct the < Inci > values for each bin 
in Ng™, we use the mean of the quantity V2b2G/fRM 
because the ICVDs are limited to low redshift, better 
sampled clusters, and our measurements are quite noisy. 

In Figure [71 we repeat the above computations for the 
mock catalogs. We again find that yplhiG I t nu ~ tjcvd 
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Fig. 6. — The bias correction to the mean velocity dispersion 
for the maxBCG clusters. Left: The ratio of the random member 
centered dispersions to <lna>, ruM- Middle: The statistical bias 
in the 2GAUSS method, b2G- See Appendix iBl for details. Right: 
The ratio of < In ct > to the geometric average of the IC VDs for 
each bin in N^OO, 

ricvD- The circles in the right panel show the 
quantity ypihic I ''' RM for each bin in N^^^. Note that the right 
panel indicates \/2b2G/rRM ~ rjcvD- 
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Fig. 7. — The bias correction to the mean velocity dispersion for 
the mock catalogs. The panels are the same as those in Figure |6] 
According to the right panel, \/2fe2G/^iiM ~ ticvd holds in the 
mock catalogs as well. 

and our explanation of the bias is self-consistent. Fur- 
thermore, since we know the true velocity dispersion val- 
ues we can directly test our arguments above in an ab- 
solute sense. This comparison is discussed in ^^5.3.21 We 
find that in fact, our correction will likely over correct 
the mean velocity dispersion so that it is 5-10% too low. 
Briefly, this effect occurs because the random "member" 
we select is in fact not always a member of the cluster. 

Finally, the Monte Carlo tests described in Appendix IB] 
allow us to test for bias in S*^ as well. We find and correct 
for bias in this parameter and note that on average we 
measure slightly lower values of than we should, by 
about 5-10%. 

5.3. Tests of the Mass Mixing Model 

We now present several checks of our method for es- 
timating mass mixing. These checks fall in three broad 
categories. The first set are done with the data itself 
and test for self-consistency along with dependence on 
sample selection functions and/or redshift. The second 
set are done with mock catalogs. Here we run the meth- 
ods developed above on the mocks in the same way they 
are run on the data, and ask whether we can recover 



the true velocity-dispersion-richness relation for halos. 
If the measurements on the mock catalogs do not match 
the true values, then we will suspect that some of the 
assumptions made above are not adequate to sufficiently 
describe the BGVCF (i.e. we might suspect that the 
infall component of the PVD histogram contributes sig- 
nificantly) . 

The third set of tests are done with a spectroscopically- 
sele cted catalog run o n lower redshift data, the C4 cata- 
log (|Miller et al.ll200 5^. For this sample, we can compute 
the distribution of velocity dispersion at fixed richness by 
directly computing velocity dispersions for each individ- 
ual cluster. We can then test our methods by comparing 
the measurements based on the stacked PVD histogram 
to the true measured distributions. 

5.3.1. Data Dependent Tests 

As a first check of our method with the data, we look 
for self-consistency. In the lower right panel of Figure [5l 
we plot the mass mixing model integrated over the entire 
data set using equations [Sj [3 [TOl [Hi and [12] on top of 
the full cluster-weighted stacked PVD histogram. We did 
not fit the stacked PVD histogram directly. The reduced 
chi-square between the data and the mass mixing model 
is 1.31, where we have u sed a robust, optimal bin size 
given bv llzenmani (|1991h . The above model reproduces 
the first four moments of the stacked PVD histogram 
as a function of N^°; and reproduces the stacked PVD 
histogram to a good approximation, indicating that the 
model is self-consistent. 

In the upper two panels of Figure [S] we compare the 
model parameters computed from the ICVDs (diamonds) 
computed using the BISIGMA method, with those com- 
puted from the stacked PVD histogram (circles). The 
two agree to within one-sigma. We note however that 
the relation for the standard deviation of Incr for the 
individual cluster velocity dispersions looks "flatter" as 
function of N^°; than for the relation computed from the 
shape of the stacked PVD histogram. 

We hypothesize two possible explanations for this ob- 
servation. First, the "flatness" could just be a statistical 
fluctuation. Notice that according to the error bars, the 
relations are consistent with each other in most instances 
by less than one standard deviation. Second, the "flat- 
ness" could be caused by a sampling effect with the popu- 
lation of clusters used to compute the individual cluster 
velocity dispersions. In other words, because we com- 
puted the individual velocity dispersions be requiring a 
cluster to have ten pairs in the BGVCF within three- 
sigma of the BCG as given by the mean velocity disper- 
sion relation, we selectively measure only a low redshift 
subset of the cluster population. 

This issue is however more than just insufficient sam- 
pling. For small groups of galaxies, it may be impossible 
to properly deffnc an obscrvationally-measurable velocity 
dispersion unless one is willing to stack groups of similar 
mass to fully sample their velocity distributions. Thus 
we hypothesize that while the two relations disagree at 
low richness, the relation computed from the shape of the 
stacked PVD histograms may in fact be a better indica- 
tor of scatter in the ct— N^°9 relation for all richnesses, 
especially low richness clusters. 

When computing the model above, we used the entire 
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Fig. 8. — Tests of the mass mixing model with the data (upper panels) and with the high resolution mock catalogs (lower panels). Upper 
Left: The ratio of the geometric mean velocity dispersion determined by the stacked PVD histogram to that determined by the IVCDs. 
Upper Right: The percent scatter in a computed directly from the individual cluster velocity dispersions (diamonds) to those computed 
from the stacked PVD histogram (circles). Lower Left: The ratio of the geometric mean velocity dispersion determined by the stacked 
PVD histogram in the high resolution simulation to the true values found by matching clusters to halos. Lower Right: The percent scatter 
in tr computed using the stacked PVD histogram (circles) compared to the true values found by matching clusters to halos (diamonds). 
The error bars for the simulation parameters are jackknife errors computed by breaking the sample into the same bins in Ng^; as used 
with measurements of the stacked PVD histograms. In the mock catalogs, note the 5-10% downward bias of the geometric mean velocity 
dispersion, as determined by the stacked PVD histogram, with respect to the true values. 



magnitude-limited sample of the SDSS spectroscopy. We 
can investigate selection effects by examining our model 
in both magnitude- and volume-limited samples. The 
volume-limited samples are constructed by extracting all 
galaxies above 0.4L*, and below the redshift at which 
0.4L* is equal to the magnitude limit of the SDSS spec- 
troscopy. Thus we are complete above 0.4L* up to fiber 
collisions, out to this redshift. Between the volume- and 
magnitude-limited samples, the differences in the mass 
mixing parameters is only slight and within the one- 
sigma errors. 

We also binned the volume-limited sample in redshift 
to check for evolution in the scatter. Although there 
are only negligible differences in the scatter in mass be- 
tween between the upper and lower redshift bins, there 
is a larger difference between the mean relations for each 
redshift bin. This evolution will be described in detail in 
^6.11 for the full magnitude-limited sample. 

Finally, we compare the mixing parameters measured 
with cluster members (with rcdshifts) defined by the 
maxBCG algorithm only to those measured with the en- 
tire spectroscopic sample (i.e. the full BGVCF). We find 
no significant differences in this test. We might suspect, 
as suggested earlier, that cluster members better trace 
the fully virialized regions of clusters. Either infalling 
galaxies do not contribute significantly, or the radial cut 
used to select members of the BGVCF was small enough 



that most of the infalling galaxies could be excluded, ex- 
cept those directly along our line-of-sight. 

5.3.2. Tests with the Mock Catalogs 

After running the maxBCG cluster finder on the mock 
catalogs, we measure the mass mixing of the identified 
clusters in the same way that it is measured for the 
maxBCG clusters identified in SDSS data. In the bot- 
tom two panels of Figure [H the mass mixing parame- 
ters computed using the 2GAUSS method with the mass 
mixing model for clusters measured in the higher resolu- 
tion simulation are compared to the true relations, found 
by matching our clusters to halos and then assigning a 
given cluster the dark matter velocity dispersion of its 
matched halo. We also performed the same analysis in a 
lower resolution simulation. We find that we can success- 
fully predict the mass mixing in both simulations above 
their respective mass thresholds, except for the 5-10% 
downward bias of the mean value. 

The bias in the mean value of the velocity dispersion 
in the mock catalogs is due to the imperfect selection 
of member galaxies by the maxBCG cluster finding al- 
gorithm. When we select perfectly centered clusters (i.e. 
cluster in which the true BCG at rest in the halo is found 
as the BCG by the maxBCG cluster finding algorithm) 
and repeat the computation of tjim, we find that the 
random member dispersions still do not increase by ^/2. 
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Instead, they increase by less than this factor and with 
this measured trm decreasing with N^^^. However, we 

can recover the factor of \/2 if we use only halo centers 
and only members within R200 of the halo center. Thus, 
because we cannot perfectly select members, the quantity 
V^/trm is a little more than unity, so that the BCG bias 
correction (in which one divides by \/2b2G/fRM) makes 
the mean too low. We cannot test for this effect in the 
data directly, but the simulations indicate that it is less 
than 10%. 

The matching between clusters and halos is done ac- 
cording to a slight modification of the method used by 
iRozo et all (|2007blla[ ). The halos are first ranked in order 
of richness, highest to lowest. Then the cluster with the 
most shared members with the halo is called the match. 
If two clusters share the same number of members, the 
one containing the halo BCG is taken as the match. If 
these two criteria fail to produce a unique match (i.e. no 
cluster contains the halo's BCG), the cluster with a the 
higher richness measure is chosen as the match. Finally, 
if all three criteria still fail to produce a unique match, 
the matching cluster is chosen at random from all clus- 
ters that meet all three criteria. When a match is made, 
both the cluster and halo are then removed from consid- 
eration and the next highest richness halo is matched in 
the same way. This procedure produces unique matches, 
but may not match every halo to a cluster or every clus- 
ter to a halo. Of the halos with matched clusters in the 
high resolution mock catalogs, we find that the first cri- 
teria fails in only 6.23% of all cases. In these failed cases, 
only 5.20%, 0.68%, and 0.35% of the halos are matched 
using the second, third, and fourth criteria respectively. 

In the SDSS data the use of cluster members only to 
construct the BGVCF caused no change in the amount 
of mass mixing. We repeat this measurement in the 
higher resolution simulation using only the cluster mem- 
bers selected by the maxBCG algorithm to construct the 
BGVCF. We see no significant improvement in the pre- 
diction of the true mass mixing parameters using cluster 
members only as compared to using all galaxies in the 
BGVCF. 

In the mock catalogs, we did not properly replicate the 
selection function of the SDSS spectroscopic sample. Un- 
fortunately, the mock catalogs have approximately half 
the sky coverage area of SDSS sample, so that when the 
proper selection function of the SDSS spectroscopic sam- 
ple is applied, there are too few galaxies to use with our 
methods. We require high signal-to-noise measurements 
of the fourth moment of the BGVCF, which is not pos- 
sible with only half the sky coverage area. However, we 
can test for the effects of spectroscopic selection within 
the C4 sample, as described below. 

5.3.3. Tests with the C4 Catalog 

Using the C4 catalog (jMiller et al.ll2005f) . we can per- 
form an independent test of our mass mixing method. 
The C4 catalog is produced by running the C4 clus- 
ter finding algorithm on low redshift SDSS spectroscopic 
data. This algorithm finds clusters using their density in 
4-D color space and 3-D position space. We make use 
of five pieces of information from the C4 catalog: a rich- 
ness estimate, an estimate of the velocity dispersion of 
each cluster, the BCG redshift, the mean cluster redshift. 



and a "Structure Contamination Flag" (SCF). This flag 
takes on the values of 0, 1, or 2, depending upon the 
degree of interaction of a given cluster with any of its 
neighbors. Isolated clusters have SCF= and clusters 
that have neighbors very close by (i.e. Az « 0.01) have 
SCF= 2. Clusters with SCF =1 are in between these two 
extremes. The me an cluster redshift is the biweight mean 
(|Beers et al.lll99dl ) redshift of all SDSS spectroscopically- 
sampled galaxies within 1 h^^Mpc and ±0.02 in redshift 
of the centroid found in the PVD histogram of the clus- 
ter. 

We use every cluster in the C4 catalog with SCFt^ 2 
and in the redshift range 0.03 < z < 0.12. Centering on 
the BCGs listed in the C4 catalog, we process the clusters 
in the same way we have processed the maxBCG clusters, 
i.e. we measure the BGVCF and then apply the 2GAUSS 
method with the mass mixing model to the PVD his- 
togram. Instead of using a projected radius cut of R200 
wc used a fixed radius of lh~^Mpc. We used a fixed 
radius here because we have no estimate of the natural 
radial scaling appropriate for the C4 richness measure. 
In order to have sufficient statistics for the computation 
of the mass mixing model, we are limited to splitting the 
clusters into two logarithmically-spaced bins of richness. 

We then compare our inferred distribution of velocity 
dispersions with the distribution of velocity dispersions 
for each individual C4 cluster in the catalog for each bin. 
The results are shown in the upper two panels of Fig- 
ure [9l which compares the lognormal with our derived 
parameters to the best fit lognormal for the individual 
C4 velocity dispersions. Note the slight bias in the mean 
between the lognormal curves computed from the mass 
mixing model and the curves fit to the C4 velocity dis- 
persions. 

In Figure 1101 we repeat our measurements, but using 
the cluster redshift instead of the BCG redshift. In this 
case we see better agreement between the mass mixing 
measurements and the C4 velocity dispersions. This re- 
sult indicates that the slight bias in the mean was due to 
movement of the BCGs. Finally, to be complete, we com- 
pute the average velocity dispersion of the BCGs in each 
bin of C4 richness, and then use this value to correct the 
measurements made using BCG centers. We find that we 
can reproduce the cluster redshift measurements through 
this procedure. Our understanding of how mis-centering 
and/or BCG movement effects our measurements is self- 
consistent in both the data and mock catalogs. If we had 
true cluster redshifts for the maxBCG clusters (i.e. an 
accurate average redshift of all of the cluster members), 
then according to the results of the C4 catalog, no cor- 
rection due to BCG bias would have to be applied to the 
mean velocity dispersion. 

We note that there seems to be high dispersion 
tail/shoulder in the histograms plotted in the upper pan- 
els of Figures [9] and Figures [TOl Including clusters with 
SCF= 2 increases this shoulder . This result is consistent 
with the finding of iMiller et al. (2005) that clusters with 
SCF= 2 have their measured velocity dispersions artifi- 
cially increased by their nearby n eighbors. Thi s resul t 
is also consistent with a finding of lEvrard et al.l (|2007f) . 
that the velocity dispersions of interacting dark matter 
halos form a high tail in the lognormal distribution of 
velocity dispersion at fixed mass. In the bottom panels 
of Figures [9] and I10( we repeat our measurements using 
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Fig. 9. — The distribution of velocity dispersion in two bins 
of richness for the C4 catalog. Upper panels show the measure- 
ment for all clusters with SCF 7^ 2 (those with close neighbors). 
Lower panels use all clusters with SCF = (isolated clusters). The 
bold lines show the lognormal distributions measured by applying 
the 2GAUSS method with the mass mixing model to the C4 clus- 
ters using BCG redshifts in the same way as it is applied to the 
maxBCG clusters. The regular lines are lognormal fits directly to 
the histograms shown above. Notice that in the bottom panels, the 
high velocity dispersion tail due to C4 clusters with close neighbors 
is gone. 
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Fig. 10. — Same as Figure[9]except the bold lines were made with 
the mass mixing model using the mean cluster redshift, not the 
BCG redshift. The geometric mean velocity dispersion as measured 
by the mass mixing model now agrees with the true geometric mean 
of the C4 clusters because we have used cluster redshifts and not 
BCG redshifts. Notice again that in the bottom panels, the high 
velocity dispersion tail due to C4 clusters with close neighbors is 
gone. 

cluster redshifts, now including only those clusters with 
SCF= 0, (i.e. we exclude clusters with SCF= 1 or 2). 
The distribution of dispersions from this set of clusters 
is in better agreement with the mass mixing method. 

In the maxBCG catalog, interacting clusters may not 
be as significant of a problem because the clusters are 
by definition much farther away from each other (i.e. 
Az > 0.02 as opposed to Az — 0.01 for some clusters in 
the C4 catalog). In fact, many of the clusters that the 
C4 algorithm would flag as SCF= 2, the maxBCG al- 
gorithm may group together. We do not mean to imply 
that the maxBCG algorithm has a significant problem of 
over-merging distinct objects, but only that the redshift 
resolution of the cluster finder is less than the C4 algo- 
rithm. Thus the high velocity dispersion shoulder would 
likely be down-weighted by algorithmic merging of ob- 
jects together in combination with sparse spectroscopic 
sampling. For example, if two C4 clusters with SCF= 2 



would be merged by the maxBCG algorithm, then the 
velocity structure according to the maxBCG algorithm 
would only be measured about a combination center, and 
not two centers as in the C4 catalog. Therefore, in com- 
bination with the sparse spectroscopic sampling, the rel- 
ative weight of these two objects in a maxBCG PVD 
histogram might be decreased as compared to a C4 PVD 
histogram, where they would contribute twice as much 
and possibly at higher velocity dispersion. While these 
arguments remain untested at the moment, the high ve- 
locity dispersion shoulder seen in the upper panels of 
Figures [9] and [10] has a clear origin and is well predicted 
theoretically. 

For the C4 sample, we can also investigate the effects 
the spectroscopic selection. We recomputed our measure- 
ments using three higher r-band magnitude limits, 17.0, 
16.5, and 16.0. Because the SDSS main sample r-band 
magnitude limit is 17.8, these three cuts replicate increas- 
ing amounts of spectroscopic incompleteness. We found 
no statistically significant differences between these mea- 
surements, indicating that spectroscopic incompleteness 
has a small effect on our measurements for the C4 cata- 
log. 

5.3.4. Sensitivity to the Scatter Model 

Here we investigate the sensitivity of our results to the 
choice of using a lognormal to describe the scatter in a at 
fixed N^°j . There may be other distributions that could 
possibly describe the scatter just as well. As a test case, 
we investigate how well a Gaussian distribution describes 
the data in comparison with the lognormal. 

We first study whether the two scatter models result in 
different conclusions. In the SDSS data, we find that at 
high richness the measured scatter assuming a Gaussian 
or a lognormal differ by less than one standard devia- 
tion. However at low richness, the measured scatter in 
the two models differs by almost three standard devia- 
tions. These results cannot tell us which model is better, 
but only whether one model is equivalent to the other. 
This seems to be the case in the SDSS, except at low 
richness. From a theoretical standpoint, we prefer the 
lognormal model because it always ensures that ct > 
without an arbitrary cutoff value. 

In the C4 data the results are more dramatic: the mea- 
sured scatter between the two models differs by around 
eight to ten standard deviations. This fact primarily re- 
flects the fact that in the C4 catalog, there is high ve- 
locity dispersion tail/shoulder, which a lognormal distri- 
bution can fit much more easily than a Gaussian. Such 
a tail/shoulder may be less prevalent in the maxBCG 
clusters for reasons discussed previously. 

In the mock catalogs, the differences between the two 
models follow the same differences as function of rich- 
ness as seen between the two models in the SDSS data: 
they are quite similar but become more different at low 
richness. Here, we can test which model provides a bet- 
ter match to the intrinsic dispersion in the catalog. We 
find that the scatter derived from the Gaussian model 
differs from the true scatter at around four standard de- 
viations, whereas the scatter derived from the lognormal 
model agrees well with the true scatter as shown in the 
lower right panel of Figure [8l We have explicitly verified 
that the distribution of velocity dispersion at fixed rich- 
ness is approximately lognormal for the mock catalogs. 
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These results give us some confidence that the mock cat- 
alogs describe the SDSS data well and that a lognormal 
is a better approximation to the true distribution than a 
Gaussian at all richness, but especially lower richnesses. 

5.4. The Velocity Dispersion Number Function 

Using the mass mixing model and the abundance func- 
tion of the maxBCG clusters, we can integrate to find 
the velocity dispersion function, usually defined as the 
number density of clusters per dlii{a). This technique 
has been used to find the velocit y dispersion functi on of 
early- type galaxies in the SDSS (jSheth et al.ll2003l ) and 
to estima te the X-ray luminosity function of REFLEX 
clusters (fStanek et al.l[2006 'l. 

In the interest of brevity, the result presented here 
is only approximate. We assume a ACDM cosmology 
for our volume computation. We restrict our analy- 
sis here to only those clusters in the redshift range 
0.1 < z < 0.3 (over which the catalog is approximately 
volume-limited), but we use the mixing results measured 
from the extended catalog. This procedure is justified 
because we previously observed no change in the mix- 
ing results using the smaller volume-limited sample. We 
do not include any corrections for the selection function 
of the catalog, but note that for the maxBCG clusters 
the completeness and purity are at or above the 90% level 
and approximately richness independent above N^°j = 10 
(|Koester et al.ll2007al lbh. Finally, we ignore any possible 
redshift evolution of the N^^j measure. 

The velocity dispersion number function (solid curve) 
with systematic and statistical errors (gray band) is given 
in the left panel of Figure [TlJ Due to these approxi- 
mations, the systematic errors in our result could likely 
be reduced in a more detailed treatment. We aim here 
to demonstrate the feasibility of such an exercise and 
note that much more careful considerations of the selec- 
tion function can be use d to constrain cosmology rather 
weU (jRozo et all l2007b[ ) . The systematic errors shown 
here arise from the selection function, photometric red- 
shift errors, and evolution in N^°° . We estimate the total 
systematic error to be approximately 30% in the veloc- 
ity dispersion function normalization due these effects. 
We also include a 10% systematic error in the geomet- 
ric mean velocity dispersion due to the uncertain nature 
of the BCG bias correction. The statistical error bars 
are generated using a Monte Carlo technique, assuming 
Poisson errors in the N^^^ number function and using the 
covariance matrices of the parameters determined in the 
chi-square line fits given by equations [10] and 1111 The 
statistical errors are too small to be shown alone. In- 
stead, we plot the systematic error convolved with the 
statistical errors as a gray band in the left panel of Fig- 
ure [TT] 

Although a detailed treatment of the velocity disper- 
sion function, which is beyond the scope of this paper, re- 
quires more careful consideration of velocity bias and the 
systematic errors, we provide a preliminary comparison 
to the predictions of the velocity dispersion function for 
three values of the power spectrum normalization. In or- 
der to make this prediction for our sample of clusters over 
the redshift range of the maxBCG catalog, 0.1 < z < 0.3, 
we use the full statistical relation between velocity dis- 
persion and mass determined in dark matter simulations 



bv lEvrard et al. | (I2007D. combined with the Jenkins mass 
function (JMF lJenki^ et al.ll2001f) and its calibration for 
galaxy cluster surveys (|Evrard et al.ll2003 ). We vary ag 
between three values, 0.80, 0.90, and 1.00, while fixing 
rim — 0.30. These three curves are plotted as the dashed, 
dash-dotted, and dotted lines in the left panel of Figure 

m 

We can give a qualitative estimate of the effect of 
the selection function of the maxBCG catalog on the 
velocity dispersion function. As noted previously, be- 
cause the fraction of red galaxies in a clusters decreases 
with cluster mass, the maxBCG catalog may be incom- 
plete in the lowest mass groups. This incompleteness 
would cause our calculation to underestimate the num- 
ber density of such low mass groups, as seen in the 
left panel of Figure [Tl] for the lowest velocity disper- 
sion groups. Note that the low mass deviation of the 
data from the predicted velocity dispersion functions oc- 
curs below a « 350 km s^^. This velocity dispersion 
is equivalent to N^JJ^w 10, in agreement with the de- 
termina tio n of the se l ection function by iKoester et al.l 
(|2007al|bl ). iRozo et al.l (|2007bl |af) has shown that at high 
richness, the purity of the maxBCG catalog decreases. 
This decrease in purity would cause an overestimate in 
the number density of the most massive clusters, as seen 
in left panel of Figure [TT] for the highest velocity disper- 
sion groups. 

Also in the l eft panel of this figure, we compare our 
data to that of iRines et al.l (2006) who compute the ve- 
locity dispersion fun ction using a n X-r ay selected sam- 
ple of local clusters. iRines et al.l (l2Q0l) define a regular 
sample which excludes low redshift clusters and combines 
any multiple X-ray peaks within the clusters into a single 
peak. They also define a maximal sample which includes 
all low redshift clusters and counts clusters with multiple 
X-ray peaks as two objects. We find good agreement be- 
tween our results and th ose o f Rincs ct al. ( 2006) for both 
samples. Note that the iRines et al.i (|2006.) local sample 
of clusters has a median redshift of 0.06 whereas our sam- 
ple has median redshift closer to 0.20. Thus we should 
not expect perfect agreement between the two samples 
because of evolution in the mass function, but according 
to the theoretical calculation, they would agree to well 
within 30%. 

In the right panel of Figure [TT] we repeat the above 
procedure using clusters in the high resolution mock 
catalogs with redshifts between 0.1 and 0.3 and with 
-^ga? — We make no attempts to correct for the selec- 
tion function so that we can crudely estimate its effect. 
The dashed curve with the gray band shows the results of 
this procedure. In order to disentangle systematic errors 
due to the selection function from those due to the mass 
mixing model itself, we remake all of our measurements 
in the high resolution mock catalogs using dark matter 
halo centers instead of the maxBCG cluster centers; the 
dotted curve shows the results. 

To obtain an estimate of the true velocity dispersion 
function, we additionally plot two other curves in the 
right panel of Figure [TT] The solid histogram shows the 
true velocity dispersion function computed from all ha- 
los with redshifts between 0.1 and 0.3 and which the 
ADDGALS procedure assigned three or more galaxies 
within 1 R200- The dash-dotted curve shows the ACDM 
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prediction for the velocity dispersion function computed 
as discussed above with as = 0.90 (the value in the sim- 
ulation used for the mock catalog). 

All four curves in the right panel of Figure [Tl] agree 
to approximately within one-sigma of the histogram er- 
ror bars, above the threshold of 500 km s~^. In the 
mock catalogs, this threshold corresponds to approxi- 
mately 10^^/i~^Mq and N^°°= 10, in agreement with 
the determi nations of the select ion function in the mock 
catalogs by iRozo et al.l (|2007bf ). Note also that the ve- 
locity dispersion function computed using dark matter 
halo centers (dotted curve) approximately agrees with 
the solid histogram of ADDGALS halos over a large 
range in velocity dispersion. The systematic errors in 
the mass mixing model alone are small relative to those 
in the selection function. 

6. DEPENDENCE OF THE VELOCITY DISPERSION ON 
SECONDARY PARAMETERS 

In discussing the scatter model and the corrected ve- 
locity dispersion values, we assumed no other significant 
dependencies of the velocity dispersion on parameters 
besides Ng°°. We address this assumption here through 
a variety of theoretically and observationally motivated 
tests. In the same way that dark matter halos are pri- 
marily characterized by their mass, we would like to de- 
termine what parameters primarily characterize the ve- 
locity structure of the maxBCG clusters. By splitting the 
sample of clusters at a given N^^^ value on secondary pa- 
rameters and measuring the velocity dispersion of these 
secondary stacks with the 2GAUSS method, we can test 
for any dependencies. 

6.1. Redshift Evolution 

The dependence of the velocity dispersion on redshift 
is shown in the right panel of Figure [121 We find a 
modest dependence, with higher redshift clusters having 
increased velocity dispersions over lower redshift clus- 
ter s of the same richn ess. Following arguments given 
bv lEvrard et al.l (j2007[ ). we can roughly estimate if the 
observed redshift dependence is due to evolution in the 
a — M relation s hip. 

lEvrard et al.l ()2007l ) have found a robust relation be- 
tween the velocity dispersion and mass of dark matter 
halos that is constant with redshift and has been tested 
with several simulation codes: 

(/^(^)M200c)'/' (13) 

where aoM is the dark matter velocity dispersion, h{z) = 
H{z) /lOO km Mpc~^ is the dimensionless Hubble pa- 
rameter, and M200C is the mass within a sphere of over 
density 200 times the critical density at redshift z. Dif- 
ferentiating at fixed mass gives 

dlnapM _ l d\nh{z) _ h'{z) _ flmjl + zf 
dz ~ 3 dz ~ 3h{z) ~ 2h{zf ' 

This quantity can be computed exactly, but given the 
poor quality of our data when split into three times as 
many bins, a rough approximation which uses the me- 
dian redshift of our sample is sufficient. At z = 0.2, 
h{z = 0.2) ^ 0.77, which gives dlncroM/dz « 0.36. Over 
the redshift range of our sample, Az ~ 0.25, so the ex- 
pected change in In cr is A In cr 0.1, assuming a constant 



velocity bias. This change is too small to account for the 
differences seen in Figure [T^ Therefore we conclude that 
there must be evolution in the N^^9 richness measure. A 

fractional decrease in N^JJ° of 30-40% from the middle 
redshift bin to the upper redshift bin is consistent with 
our results. Such an evolution is likely to be a combina- 
tion of tr ue evolution in the number of galaxies at fixed 
mass fe.g. lKravtsov et al.ll2004l: IZentner et al.ll2OO50 and 
evolution in the definition of the richness estimator at 
fixed halo occupation. There is evidence f rom the evolu- 
tion o f richness in both the data (see also iKoester et al.l 
l2007al) and the mock catalogs, that the current defini- 
tion of N^°° does have mild evolution. It may however, 
be possible to use a slightly modified richness estimator 
which does not evolve at fixed mass. We do not explore 
this possibility further here, but note that velocity dis- 
persion values will be useful in assessing the evolution of 
the Ng^° measure and attempts to correct for it. 

We finally note that the observed evolution could have 
other explanations as well. Above redshift ~ 0.12, the 
spectroscopic sample is dominated by LRGs. A relative 
velocity bias bet ween galaxies of diffe rent colors and/or 
luminosities (e.g. iBiviano et al.l Il992f ) in clusters could 
potentially be the cause of the observed evolution. 

6.2. Environmental Dependence and Local Density 

Considerable attention has been devoted to the envi- 
ronmental dependence of the velocity d i spers ion in N- 
body simulations (e.g. iSheth fc Diaferiol 120011 ). It has 
been found that the velocity dispersion does depend on 
local density, but only because massive halos tend to oc- 
cupy more dense environments (i.e. halo bias) and the 
velocity dispersion is strongly correlated with halo mass. 
No direct dependence of the velo city dispersion on the lo - 
cal density has been found fe.g. lSheth fc Diaferioll200i[) . 

In order to test this prediction, we construct four in- 
dicators of local density: N^°j of closest cluster, the pro- 
jected distance to the closest cluster, the total number 
of cluster members of any cluster within 5 h~^Mpc and 
±0.04 in redshift, and the total number of clusters within 
5 h^^Mpc and ±0.04 in redshift. This redshift cut is cho- 
sen to match twice the redshift cut used in the maxBCG 
percolation process to ensure that only clusters from one 
redshift slice on either side of the cluster under consid- 
eration are used. 

We complete the test by comparing two binning 
schemes. First, we bin in N^JJ? and then on the lower 
25%, middle 50%, and upper 25% quantiles of the local 
density parameters within each N^^j bin. This method 
should roughly account for the mass trend with cluster 
richness before comparing local environments. Second, 
we reverse the binning orders, using the lower 25%, mid- 
dle 50%, and upper 25% quantiles of the N^™ distribu- 
tion in the second step. If there is in fact no dependence 
of the velocity dispersion on local density, except because 
massive halos tend to occupy more dense environments, 
then the relations in this binning scheme should be con- 
stant for a given N^J|J° bin. Note however that if the halo 
occupation itself correlates with local density at fixed 
mass, then our test could be significantly biased. 

Using this technique, we find little significant depen- 
dence of the velocity dispersion on any of our measures 
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Fig. 11. — The velocity dispersion number function with no corrections for the selection function of the maxBCG clusters (left) and its 
reconstruction in the high resolution mock catalogs (right). Left: The theoretical prediction for the velocity dispersion number function 
us ing the Jen kins mass function is combined with the N-body calibrated relation between mass and velocity dispersion, usin g the work 
of lEvrard et al.i (i2007f) . for three values of erg. Each assumes that Cm = 0.30. The circles and squares show the results of iRines et al.l 
l|2006l ) for their regular and maximal local X-ray selected clusters samples. The solid line is the velocity dispersion function of the maxBCG 
clusters. The gray band errors indicate the systematic errors, neglecting any corrections for the selection function, convolved with the 
statistical errors in our measurements. Above approximately 1000 km s~^ the data is extrapolated. Right: The dashed line with gray 
band errors shows the velocity dispersion function computed from clusters in the high resolution mock catalogs in exactly the same way 
as done with the maxBCG catalog, neglecting any corrections for the selection function. To estimate the systematic errors in the mass 
mixing model alone, we compute the velocity dispersion function using halo centers (dotted line) instead of BCGs. The solid histogram 
shows the velocity dispersion function of all halos between redshifts of 0.1 and 0.3 which the ADDGALS algorithm assigned three or more 
galaxies within 1 R200 • This is compared with the ACDM prediction for the velocity dispersion function computed as in the left panel with 
erg = 0.90 (the value in the simulation used for the mock catalog). 

that dark matter halos which form earlier at fixed mass 
have brighter, redder central subha los (i.e. brighter, red- 
der BCGs) and low er richness (e.g. IWechsler et al.|[2006t 
ICroton et al]|2007[ ). Note that a brighter BCG at fixed 
mass for earlier forming halos likely corresponds to a 
larger magnitude difference between the BCG and the 
member galaxies. 

We repeat the first binning scheme used above with 
the z-band magnitude difference of the BCG and the 
next brightest cluster member as the secondary param- 
eter. For each bin in N^JJ°, the brightest member and 
BCG i-band magnitude difference distribution is split by 
its lower 25%, middle 50%, and upper 25% quantiles. 
The naive expectation that clusters with more than one 
bright member might have undergone a recent merger 
or have significant substructure is not born out by the 
velocity dispersions, which show no significant increase. 
However, because the computations are done at fixed 
richness, it could be that late-forming halos, which have 
higher richness for their mass, also have higher velocity 
dispersions for their mass, because they merged recently, 
so that the two effects conspire to roughly cancel each 
other. Unfortunately, this hypothesis is difficult to test 
observationally. 

One can similarly test for mass dependence of the clus- 
ters on the luminosity of the BCG alone. In the lower 
right panel of Figure ll3[ we plot the velocity dispersion of 
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Fig. 12. — The dependence of exp(< ln((T) >) at fixed richness 
on the BCG photometric redshift. The dependence of the velocity 
dispersion on redshift likely indicates evolution in the N^^j mea- 
sure. The solid lines are the best-fit power-laws to the highest and 
lowest redshift bins. 



of local density. Figure [13] shows the results of this test 
for one of the parameters, the total number of clusters 
within 5 h~^Mpc and ±0.04 in redshift. We have tested 
these results for fixed bins in N^JJ^ and the local density 
parameters, finding that they are robust. We applied 
these tests to the mock catalogs as well, producing simi- 
lar results. 



6.3. Multiple Bright Members and BCG i-hand 
Luminosity 

It has been shown that clusters which have undergone 
recent merger s and show significant substructure are not 
virialized (e.g. llguchi et al.ll2005HDiaferio fc Geile^flOQel) 
and that their velocity dispersions are increased above 
the expectation for their mass (e.g. ICortese et al.l 12004 
IHallidav et al.ll2004HGirardi et al.ll2005ir One might ex- 
pect that a cluster with multiple bright members that 
resemble the BCG has undergone a recent merger or has 
significant substructure. However, it has also been shown 



the clusters first binned in N^°j and then in the absolute 
i-band magnitude of the BCG using the lower 25%, mid- 
dle 50%, and upper 25% quantiles within each Ng"° bin. 
We see dependence on this parameter: at fixed richness, 
clusters with more luminous BCGs have higher velocity 
dispersions. This same effect is observed in the stacked 
X-ray measurements of these same clusters by (Rykoff 
et al. 2007, in preparation); here, clusters with brighter 
BCGs have on average more X-ray emission. These ob- 
servations indicate that BCG luminosity may contain ad- 
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Fig. 13. — Dependence of the velocity dispersion on secondary parameters. Upper left: The histogram of the total number of clusters 
within 5 /i~^Mpc and ±0.04 in redshift of each BCG in the BGVCF. Upper right: The velocity-dispersion-N^^^ relation in bins of the 
number of clusters in a projected volume of each BCG. Lower left: The relation between velocity dispersion and the number of clusters in 
a projected volume of BCG in bins of N^^j . Lower right: The dependence of the velocity dispersion— N^^j relation on the absolute i-band 
magnitude of the BCG. No secondary parameter dependence would result in constant relations in the lower left panel and completely 
parallel relations with the same normalization in the upper right panel, assuming the halo occupation does not correlate with local density 
at fixed mass. 



ditional information about cluster mass beyond that in 
NfJI afonc. 

In the case of BCG i-band luminosity, the expectd 
correlation mentioned above consistent with the obser- 
vations, since early- forming halos would have brighter 
BCGs and lower richness, so that at fixed richness, halos 
with brighter BCGs tend to be more massive. However, 
for this explanation to be consistent with the previous 
hypothesis concerning the magnitude difference of the 
BCG and the next brightest cluster member, we must 
assume that whether or not a halo has formed through a 
major merger recently correlates more strongly with the 
magnitude difference of the BCG and the next bright- 
est member galaxy, than with the BCG i-band luminos- 
ity alone. In other words, we need to assume that the 
BCG j-band luminosity is not indicative of a recent major 
merger (which would cause the velocity dispersion to be 
overestimated at fixed mass), even though BCG i-band 
luminosity correlates with formation time. 

6.4. Cluster Concentration and Radial Dependence 

Although we cannot measure the true mass concen- 
tration, we investigate the dependence of the velocity- 
dispersion richness relation on the galaxy concentration, 
measured here by the ratio of the number of cluster mem- 
bers (determined by the maxBCG cluster finder, not the 
number of pairs in the BGVCF) within 0.2 R200 to the 
number of members within R2oo- We see no dependence 



of the velocity dispersions on this parameter when the de- 
pendence on N^°j is accounted for first. When one bins 
directly on this ratio, the velocity dispersion decreases 
with increasing concentration. 

Finally, we investigate the dependence of the velocity 
dispersion on cluster radius. The scaling of a with radius 
is measured in logarithmic bins of Ng"°. In general, the 
dispersion stays constant or decreases with radius. This 
is consistent with the results from previo us studies (e.g. 
iRines et al.ll2003l: iRines fc Diaferioll2006l ) . 

7. CONNECTING VELOCITY DISPERSION TO MASS 

Using velocity measurements to probe cluster 
masses has a long history in astronomy; the virial 

theorem was the earl iest too l used to det ermine 

cluster masses (e.g. iZwickvl [T933L [T937I: iSmithj 
1936 ) and remains in use t oday (e.g. iGirardi et al 



19981 iStruble & RoodI [l99l IRines et all 120031 120061 : 
Rines fc Diaferioll2006t l. Other methods for determining 



cluster masses, such as the projected m ass estimator 
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2003; Rines ct al. 2003; Katgert et al. 2004), a nd th e 
caustic method (|Diaferio fc Gelleii 11997: Diaferio HH 



have also been wid ely ap plied fe. g. iBiviano fc Girardil 
I200I IRines et al.l [20031 [20061 : iDiaferio et al.l 120051 : 
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iRines fc Diaferioll2006D . 

In order to connect the velocity-dispersion-richness re- 
lation t o a mass-richn ess relation, we use the recent re- 
sults of lEvrard et al.l (|2007), who found a dark matter 
virial relation which appears t o hold for all r edshifts and 
a wide range of cosmologies. lEvrard et al.l JlOOT.) used 
a suite of dissipationless simulations run with a range of 
simulation codes and resolutions to measure the velocity 
dispersion of dark matter particles at fixed mass. They 
find that the dark matter virial relation can be charac- 
terized as a power-law, 



200c 



lO^^Mp 



1 



l/c 



(15) 



h{z) \ (715 

where h{z) = iJ(z)/100 km s~^ Mpc~^ is the dimen- 
sionless Hubble parameter and M200C is the mass within 
a sphere of over density 200 times the critical density at 
redshift z. The values of the fit parameters for the mean 
relation are found to be cri5 — 1084 ± 13 km s~^ and 
a = 0.3359 ± 0045. 

lEvrard et al.l (|2007f ) additionally found that the scatter 
of velocity dispersion at fixed mass is well fit by a lognor- 
mal with a small scatter of only 0.0402 ±0.024. However, 
the lognormal scatter in velocity dispersion at fixed mass 
does not directly relate to the scatter in mass at fixed ve- 
locity dispersion without assuming the shape of the halo 
mass function. In light of this difficulty, and the small 
scatter in the r elation, we ta k e the mean power-law re- 
lation given by lEvrard et al.l (j2007l ) to be a completely 
deterministic relation. 

As there is still substantial theoretical uncertainty in 
velocity bias, this will be a primary driver of the system- 
atic error in the N^°9-mass relationship. To avoid this 
uncertainty, we constrain a combination of velocity bias 
and mass as described below. Using the standard defi- 
nition of velocity bias, by = ogal I (^dm where goal is 
the galaxy velocity dispersion and a dm is the dark mat- 
ter velocity dispersions. The virial relation for galaxy 
velocity dispersions then becomes 

l/a 



where the quantity bl^°' M2oac parameterizes our lack of 
knowledge about velocity bias. 

To use this relation with the maxBCG clusters, we 
calculate < cr^/" > using the measured lognormal distri- 
bution of (7 in each bin. The result is 
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where ■^^^.g and ^^^^ are given by the 2GAUSS fits for 
each Ng°° bin (i.e. equations [8] and [9] respectively). This 
value is then substituted for gqal in equation \W\ To 
account for the factor of h{z), we repeat the 2GAUSS 
fits for each N^^^ bin, weighting each pair in the BGVCF 
by l/h{z)"-. The inclusion of this factor has a negligible 
effect on the observed evolution in i i6.1l We include the 
correction for BCG bias by dividing our results by the 
average BCG bias factor raised to the l/a power. 

We apply this method first to the mock catalogs, to 
determine whether we recover an unbiased estimate of 
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Fig. 14. — The dark matter virial relation applied to our stacks 
of clusters in the mock catalogs (left, circles), and in the data 
on maxBCG clusters (right, circles). The mean M20OC masses for 
matched halos in each bin are plotted for the mock catalogs on 
the left (diamonds). The solid line shows the best-fit power-law 



The dashed lines show 



relationship between by^°'M200c and N^^^. 
the mass normalization shifted upward by 25% to account for the 
over correction of the BCG bias correction. 

the mass-richness relation. In the left panel of Figure 
UM we show the results of this procedure applied to the 
mock catalogs. The best-fit power-law is plotted as the 
solid curve. For comparison, the mean mass of a cluster 
in each N^2° bin, computed as the mean M200C mass 
of the halos matched to the clusters within the bin, is 
shown as the diamonds. We again see the slight over 
correction of the BCG bias correction. Because the mass 
is proportional to , the masses we measure in the mock 
catalogs are too low by '--^ 15 — 25%. The dashed line in 
the left panel of Figure [14] shows the best-fit power- law 
relation between N^°9 and mass in the simulation, but 

gaL 

with the normalization increased by 25%. Note that in 
the mock catalogs the velocity bias is defined to be unity, 
so that equation [1^] should be exact with by = 1 and 
Goal = gdm- 

The results for the maxBCG clusters are given in the 
right panel of Figure [TH The error bars include the the- 
oretical uncertainties in both a and tiis . The theoretical 
uncertainties increase the error bars in our mass determi- 
nations by a fixed factor uniformly across each bin. The 
best-fit power-law for the mass-N^^° relation is 



^y^Msooc = 
(I.18t°j? X lOi4/i-iM, 



«/25 



1.15±0.12 



and is shown as the solid line in the right panel of Figure 
[Til The dashed line shows the approximate effect of the 
BCG bias over correction by shifting the mass normal- 
ization up by 25%. 

8. CONCLUSIONS AND FUTURE PROSPECTS 

In this paper we have presented new measurements of 
the BCG-galaxy velocity correlation function (BGVCF) 
for a sample of cluster s identified from the S PSS with the 
maxBCG algorithm (Koest er et al]|2007airbl '). Through 
careful modeling of the shape of the BGVCF, we have 
measured the mean and scatter in velocity dispersion at 
fixed N^°j . We find that the mean velocity dispersion at 

fixed N^^° is well described by a power-law. The mean 
velocity dispersion increases from 202 ± 10 km s^^ for 
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small groups to more than 854 ±102 km s ^ for large 
clusters. The scatter in velocity dispersion at fixed N^°j 
is at most 40.5 ± 3.5% and falls to 14.9 ± 9.4% as N^™ 
increases. We test our methods on both the C4 cluster 
catalog and on mock catalogs. Although there may be 
a slight 5-10% downward bias in the mean velocity dis- 
persion due to the corrections made for BCG bias, our 
method successfully recovers the true scatter in both of 
these data sets with little bias. 

The method presented here for measuring the scatter 
depends on two assumptions: (1) the Gaussianity of the 
PVD histogram of a stacked set of clusters with simi- 
lar velocity dispersion, and (2) the lognormal shape of 
the distribution of velocity dispersion at fixed richness. 
While the first assumption is valid in cluster samples 
produced by running the cluster finder on realistic mock 
catalogs, it is hard to directly test observationally. Simu- 
lations with galaxies based on resolved dark matter sub- 
halos may clarify this issue. The second assumption is 
directly supported in the mock catalogs a nd by inde- 
pend ent observations with the C4 catalog (jMiller et al.l 
[20051) . 

In addition to the measurement of the mean and scat- 
ter in the vclocity-dispersion-richness relation, we ex- 
plore the dependence of the velocity dispersion on pa- 
rameters secondary to richness. The velocity dispersion 
seems to be affected significantly by the i-band luminos- 
ity of the BCG. We also see velocity dispersion depen- 
dence on redshift and local density. While the correlation 
between N^°°, velocity dispersion, and the BCG i-band 
luminosity may be a true physical effect, we interpret the 
correlations of N^°; and velocity dispersion with redshift 
and local density as unphysical, systematic effects of the 
maxBCG cluster finder. Ultimately, it may be that the 
best way to estimate cluster mass will be to use multiple 
observables in combination. By making the comparisons 
of different parameters and their dependence on veloc- 
ity dispersion as done in this paper, we will be able to 
determine which observables correlate significantly with 
mass. 

Our methods, in combination with weak lensing mass 
profiles measured for stacked maxBCG clusters (Shel- 
don et al. 2007, in preparation; Johnston et al. 2007, 
in preparation) and the radial phase-space information 
contained in the BGVCF, will allow for precise determi- 
nations of the velocity bias and the anisotropy of galaxy 
orbits in clusters. Precise measurements of these quan- 
tities will help to constrain current theoretical models 
of galaxy clustering and the velocity bias between dark 
matter and galaxies. 

This work also demonstrates the feasibility of using 
our methods to measure the velocity dispersion function. 
The velocity dispersion function co mputed in t his pa- 
per agrees with the results of Rines et al.l (|2006D . How- 
ever, given the current estimated systematic errors in 
our computation (due the selection function, photomet- 
ric redshift errors, evolution in N^^^, and the BCG bias 
correction), we are unable to reach any strong conclu- 
sions about the magnitude of erg- With this caveat, we 
do however see from Figure [HI that our results support 
a higher value of erg than the most recent CMB-fLSS 



estimates (e.g. ISpergel et aLll200d iTegmark et aIll2006D 
as recently suggested by o ther analyses (e.g. lBuote et al.l 
120061: lEvrard et al.ll2007l: iRozo et al.ll2007bD . This con- 
clusion is however degenerate with velocity bias. A ve- 
locity bias of approximately 1.1-1.2 could equally well 
explain our results. 

The methods presented in this paper are a significant 
advancement for the use of optical cluster surveys to de- 
termine cosmology. Our method can fully characterize 
the velocity-dispersion-richness relation for any optical 
cluster survey with a large spectroscopic sample. Future 
redshift surveys with more galaxy redshifts will allow for 
more precise measurements of this scatter. Specifically, 
because the SDSS spectroscopy is mostly at or below 
2 ~ 0.1, a higher redshift sample of spectroscopy would 
allow for further tests of any redshift dependence. 

The measurement of the scatter in mass-observable 
relations is key to the measurement of cosmology 
from galaxy cluster surv eys and self-calibration schemes 
(iLima fc Hull2004 l2005f l. Through adding an additional 
piece of observational information, the methods devel- 
oped here will undoubtedly tighten constraints on and 
lift degeneracies in current estimates of cosmology. 
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APPENDIX 

A. GROUP WEIGHTED EM-ALGORITHM FOR 1-DIMENSIONAL GAUSSIAN MIXTURES WITH EQUAL MEANS 

Here we present the modification of the standard EM-algorithm ([Dempster et aLl I1977D for Gaussian mixture 
models that is used to fit the PVD histograms. It has the advantage of weighting clusters (or groups) of points 
evenly and fixing the the mean of each Gaussian to be equal. We also veri fy that the algo rithm works using simple 
numerical experiments. The derivation and notation given here is that of IConnollv et all ([2000|) with our changes 
noted appropriately. 

Let j index the number of Gaussians in the model, i index the data points, and Ni be the total number of data 
points in the group from which the ith data point is drawn. The statistical mod el for the entire s et da ta set will be a 
sum of Gaussians plus a single constant background component. We differ from iConnoUv et al.l (|2000f ) by presenting 
the derivation of the algorithm with the background components included. Letting the Gaussian components be given 
as 

0(x,M,a,) = exp (^^^ j (Al) 

where /x is the common mean of all of the model components, aj is the standard deviation of the jth Gaussian, and 
the background component, U{x), be given as 

U{x) = ^ (A2) 
where 2L is the range of all of the data, X, we can write the model as 

P(X|/i,(Ti,...,crj) ^poU{x)+^pj(j){x,^i,aj) (A3) 

where the pj 's are the weights for each model component and we require that po + Tlij Pj — 1 ■ This model is exactly that 
of the standard EM-algorithm for Gaussian mixtures. The difference here will be in the structure of the latent variables. 

Let Zij be defined such that zij = 1 if the ith data point is in the jth Gaussian and Zij = otherwise. Now we can 
write the complete data log-likelihood as 

£ = ^ Zio In (Po U {x)) + X! X! ^'J ^^Pi + ^' '^j)]- (A4) 

i j i 

Now we perform the expectation step of the algorithm given the current parameter guesses, 0, computing 
Q^E{C\X, 9) 

= E{z^o I X, 9) In {po U{x)) + E E ^(^^J' I ^' P^P^' + <^(^' '^J )] 

i j i 

i j i 

where is now weighted by cluster (or group) to give 

n,=E{z,,\X,9) 

= P{xi is in group i\X,9) 

1 P(x, I group j)P(group j) 



and for the background 



J2r P{^i I group r)P(group r) 
J Pj(t){xi,p.,aj) 

NiPqU{x) +Y.rP-r 4>{Xi,ij;a-r) 



no = E{zM\X,9) 

= P{xi is in background | X, 9) 

1 P{xi I background)P(background) 
^■i Pi^i I group r)F(group r) 
PO U{x) 

NiPoU{x) + Y,rPr4>{Xi,jl,ar)' 



(A5) 



(A6) 
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Now we compute the maximization step by maximizing Q over the parameters. FoUowing IConnollv et ahl (|2000f) . 
we first rewrite Q into a more manageable form and then maximize Q. Using the definition of our model and its 
components we have 



« 



Letting 



Q = ^ T,o [Inpo - In (2L)] + ^ ^ r, 

'''ij ^' 



hipj In 27r — In (T,- - ^ ' 



2ai 



'12 ij '''ij 



(A7) 



(A8) 



and noting that ^ —ji^i — /t)(A ~ m) = 0, we get 

ij ^ 

ij ij 

= [(^' " + (A - 

ij 



where 



Now we rewrite Q as 

<3 = ^« [Inpo - In (2L)] + XlZl 



ln,,--ln2.-lna,-^-l^ 



(A9) 



(AlO) 



The maximum of Q subject to po + 12jPj ~ 1 ^^^1 tie given by letting p, be given by equation IA8I and the rest of the 
parameters by 

= (All) 



ij 



The primary differences between the derivation given here and that given by IConnollv et al.l pOOOT ) are the re- 
weighted expectation values of Zjj, e quations IA5I & IA6[ and that the means of each of the Gaussian components are 
fixed to one value (i.e. equation I A8[) . The cluster-weighting is encoded in the factor of 1/Ni in equations IA5I and IA6I 
Each cluster of points contributes an equal amount to the parameters of the mixture model, making the total model 
wighted by cluster, not by point. 

In Figure [151 we plot the point-weighted histogram of data created from 5000 and 500 random draws from two 
Gaussians with dispersion 800 and 300 respectively along with the standard EM algorithm fit using 10 components. 
We also plot the cluster-weighted histogram of the same data fit with the EM algorithm derived above using 10 
components (bold line). The shapes seen in the figure are expected. In the point- weighted case, the 5000 samples of 
the 800 width Gaussian dominate the PVD histogram so it appears to be Gaussian. However, in the cluster-weighted 
case, the two Gaussians of width 800 and 300 contribute equally. Thus we explicitly see the the non-Gaussianity in 
the histogram. In effect, by not weighting the histogram by cluster, one can "hide" non-Gaussianity statistically. 

B. STATISTICAL BIAS IN THE 2GAUSS METHOD 

In this appendix, we describe the statistical bias correction we apply to the 2GAUSS method. Suppose we have ran- 
dom samples, D = {di, c?2, ■ • ■ , rfjv}, from a distribution, p{x), characterized by a set of parameters, z — {zi, Z2, . . . , 
so that we can write p{x;z). A simple example would be a Gaussian characterized by its mean and variance. From 
these random samples, we can construct estimators, /t, of the moments of the distribution, = x'^p{x; z)dx. 
A well known example of an estimator of is the mean, p. = X^fcLo dk/N. An estimator of the ith moment of a 
distribution is said to be unbiased if J2^^ A(D) p{x; z)dx — ^i^i) . In some cases the sampling distribution of an estimator 
can be computed exactly, so that bias can be computed and corrected for analytically. 

Because the 2GAUSS method is rather complicated, instead of attempting to compute the bias analytically, we use 
a Monte Carlo method to estimate the bias. For each bin in N^°°, we construct 10,000 Monte Carlo samples of the 

set of velocity separation values, v = {wi, W2, . . . , wat}, using the measured parameters < Intr > and S^, the number 
of clusters in the bin, and the number of samples per cluster in the bin. Then we remeasure < Incr > and S"^ for 
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Fig. 15. — A simple test of the cluster-weighted EM algorithm. The bold line and histogram are cluster-weighted; the thin line and 
histogram are point-weighted. The point-weighted histogram can statistically "hide" non-Gaussianity because it allows one component to 
contribute the majority of the points to the histogram, and thus determine its shape. The cluster-weighted histogram eliminates this issue 
by forcing each component to contribute equally. See text for details. 

each Monte Carlo sample. From these 10,000 reestimations of <\na> and 5^, we estmiate the bias in the 2GAUSS 
method. We then use this bias to correct our measurements. 

We note here that the Monte Carlo tests indicate that the estimations of < In > and are correlated in a non- 
trivial way. The BAYMIX method would naturally account for these correlations in a transparent way. Alternatively, 
a different, unbiased proced ure could be used to esti mate <\na> and S^. One candidate method might be the use 
of Gauss-Hermite moments ()van der Marel fc Franx| [T993). Given that we already have a well developed method to 
estimate <\na > and S^, we do not explore this possibility here. A detailed understanding of the correlations is 
beyond the scope of this work, but would be necessary for the use of these results to understand cosmology precisely. 
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